sys.getsizeof
will tell you how much memory the object occupies in Python, but not how large the JSON encoded size will be.
While CB may compress data internally, the 20MB limit is on the uncompressed size.
The simple approach to checking the document size would be to serialize the data yourself to JSON and check its resultant size:
size = len(json.dumps(data))
The size will then tell you how large your object is in bytes as far as Couchbase is concerned. “Splitting” JSON would be a bit complex since JSON is a structured data format; you can split the encoded form but then each section would not be readable without the other (this means it cannot be used with views or N1QL).
Another issue with the simple approach is, assuming data
is a dictionary, the following will result in two calls to serialize the JSON:
size = len(json.dumps(data))
cb.upsert(key, data) # Serializes data implicitly
If only casually using the library this may not be an issue, but especially for large objects, JSON serialization in Python (using the default JSON libraries) may be expensive; so you may:
- Store your data as “raw bytes” (
FMT_RAW
or FMT_UTF8
) - and lose the ability to have the client automatically decode the JSON on get()
calls
- Implement a custom
Transcoder
(http://pythonhosted.org/couchbase/api/transcoder.html) which you could then indicate (via a wrapper object) that you are passing an already-serialized JSON string (and thus should be treated like JSON, except that it need not be serialized twice). Something like:
ALREADY_JSON = object()
def encode_value(self, value, format):
if (format is ALREADY_JSON):
value, _ = super(MyTranscoder, self).encode_value(value, FMT_UTF8)
return value, FMT_JSON
else:
return super(MyTranscoder, self).encode_value(value, format)
This works by defining a custom “format” (ALREADY_JSON
) which can be used to signal to your custom transcoder that it should not encode to JSON again.
The above (with the transcoder) is just an optimization however, using the default transcoder will still function correctly (if you pass the original object).