We observe very slow write performance using the Couchbase Spark Connector.
The Connector 2.2.0 currently using async
bucket and inserts documents one-by-one. We use the Connector with Spark Streaming, where 1000-5000 documents supposed to be inserted per second. Documents go through expensive models before insertion, but nevertheless, the writing dominates the run time of the whole micro-batch.
We have very few indexes on documents, practically one index on the document type (cardinality of 2).
Are there any tips to improve? Maybe rewrite the insertion to bulk operations?
Cheers,
Zoltán