Hello!
We are working with Couchbase 7.0 Beta using scala-client - 1.1.3 and couchbase-transactions - 1.1.6.
We have a use case, where we have to store files (Images and PDFs) in the Couchbase database.
These are saved in a RAW format (using RawBinaryTranscoder).
In some cases we would like to include these operations in a transaction.
Unfortunately we could not find a way to do this.
In the Java/Scala SDK we could specify the transcoder, but this is not possible with the transactions API.
Is there anyway to include RAW documents in transactions? Is there any future plan to support this?
Thank you in advance,
Martin
1 Like
Hi @horvath-martin
Transactions support JSON documents only, there is no-built in support for raw/binary documents. It is not currently on the roadmap either, but, never say never! Community feedback like this helps us know where to expand our featureset in the future.
As a workaround for now, is it possible for you to say encode the binary document into JSON, e.g. with base64?
@graham.pople I conducted a quick evaluation of binary serialization into JSON using base64. Literature suggests base64 instead of more sophisticated formats (like base85) due to its speed and ease of compatibility/implementation.
The SDK request_encoding
traces suggest that serializing a Scala case class (that contains an Array[Byte]
) to a JSON document with a binary array size ranging from 5 to 15 MB using base64 takes anything between roughly 10 and 100 milliseconds on an i7 10th Gen.
Hi @zoltan.zvara
10-100 millis surprises me, that seems pretty slow. Still, it’s not completely unworkable - is that performance suitable for your workload?
For technical reasons it’s non-trivial for us to add binary support - basically we depend on the KV sub-document API, which is JSON-only as that was what it was originally designed for. So anything we could do, without a much larger engineering effort to support binary with sub-document at least, would be similar to the kinds of approaches you’re currently exploring - finding a fast serialization approach on the client-side.
An alternative approach might be to insert the binary document using regular KV before the transaction, and then do the transaction with a link to that inserted document, deleting the doc again if the transaction fails. That’s assuming of course that it’s ok for the document to not have ACID guarantees. I don’t know your use-case, but sometimes for binary files (such as images) it is less crucial that they are not visible pre-transaction, guaranteed to be rolled back if the transaction fails, etc.
@graham.pople yes that performance is suitable for the workload.
The alternative solution that you suggest is something that we do already for binary files - as you just said as an example, for example - images in our system are stored that way.
However, some 10-20 MB files are crucial to participate in an ACID scheme, thus, we currently base64-them into a JSON. Note that while running the perf, my laptop (i7 10th gen) does not properly isolate CPU and I don’t have statistically relevant data from the traces, thus, I would place my 50 cent on the lower end of the 10-to-100 ms range that I observed. On a Xeon E5-2600 series v4 it should be around 5ms or less then. We will see and I’ll make and attempt to provide good data in production.
Thanks for your valuable hints!
1 Like