We have a file of approximately 270 MB containing around 1600 INSERT
statements, each inserting a single JSON document. Each document is relatively large and includes arrays. Executing this file through cbq
takes about 45 minutes, which seems excessively slow.
Here’s what we’ve tried so far:
- Parallel Execution: Splitting the file and running multiple
cbq
sessions in parallel reduced execution time, but productionizing this approach would require additional time and effort. - Batch Inserts: We attempted batching by inserting 10 documents in a single
INSERT
statement, but this didn’t improve the performance. - Using
cbimport
: Whilecbimport
is significantly faster, it requires the file to be in JSON format, which involves additional preprocessing. - Ruling Out Network and Hardware Constraints:
- To eliminate network overhead, we ran the N1QL queries directly on the machine hosting Couchbase Server.
- We placed the file on a RAM disk to minimize disk I/O overhead.
Despite these efforts, the performance improvement has been minimal. We are keen to understand why processing takes so long. The documents are large, but not excessively so to justify such delays.
Is there any parameter or option in cbq
that could help speed up the process? We’ve already tried disabling logging to stdout
and other logs, but that didn’t help. Any insights would be greatly appreciated!