I’m using spark connector 2.0 with spark 2.0.
When trying to query using RDDFunctions.couchbaseQuery (N1QL queries), the request rate seems to by very low (around 1K per second), whereas when using RDDFunctions.couchbaseGet, it gets to a much higher rate of around 100K per second).
I’m using Memory-Optimized Global Secondary Indexes and when querying manually, the response time is ~ 5 ms.
Has anyone experienced similar issues and had a solution for better performance?
I don’t know if the Spark connector uses prepared statements, and how it manages and multi-threads N1QL connections, queries, etc.
To get a good throughput estimate, you could write a simple multi-threaded Java app and use the Java SDK to run the same N1QL queries. The maximum throughput you get there is probably the maximum you will get with the Spark connector.
The 5ms latency means each thread can get 200 qps.
Finally, what is your cluster setup (version, query/index/data, # cores, etc.). @keshav_m
The spark connector does not use prepared statements right now and each worker is executing against 1 socket per query service (like the default with the SDK in general). Many of these params are tuneable through system properties right away.
How many query nodes do you have in your couchbase cluster, and as @geraldss mentioned can you share more details on your whole setup?