Slow write performance using Couchbase Spark Connector 'saveToCouchbase'

zoltan.zvara · November 17, 2017, 11:46am

We observe very slow write performance using the Couchbase Spark Connector.

The Connector 2.2.0 currently using async bucket and inserts documents one-by-one. We use the Connector with Spark Streaming, where 1000-5000 documents supposed to be inserted per second. Documents go through expensive models before insertion, but nevertheless, the writing dominates the run time of the whole micro-batch.

We have very few indexes on documents, practically one index on the document type (cardinality of 2).

Are there any tips to improve? Maybe rewrite the insertion to bulk operations?

Cheers,
Zoltán

zoltan.zvara · November 17, 2017, 12:35pm

Update:

Now having UPSERTs, which is better, but still not acceptable.
What are the suggested number-of-executors, executor-cores or number of writer partitions based on machine resource dimensions? I see that there is only one CouchbaseConnection per executor. Is that right?
CPU and cluster IO not fully utilized.

Thanks for tips,
Cheers,
Zoltán

Topic		Replies	Views
Leveraging pyspark to write to couchbase Spark Connector	3	956	April 21, 2022
When connect to more than one bucket the reads are slowing down (number of gets/sec) Spark Connector	0	1511	January 23, 2018
Inserts are around 1k on Cluster on Dev testing Couchbase Server	1	1241	September 29, 2017
Low N1QL queries/sec rate Spark Connector spark , n1ql	2	2163	January 3, 2017
Parallel inserts in couchbase using multithreading in Java Java SDK	3	3698	February 26, 2015

Slow write performance using Couchbase Spark Connector 'saveToCouchbase'

Related topics