Issues on querying data

Karthikeyan · October 3, 2022, 4:09am

I have few issues while querying data from couchbase collection.
I have 10M data with doc id’s from 1 to 10M
I fired a select query - SELECT *,META(doc).id as docId FROM src.scp1.col1 as doc;

here I get only partial documents,
for example, I have 10M records in my cluster , and I get the items queried as

doc 1
doc 10
doc 100
doc 1000
doc 10000

and so on , I don’t get the doc 2, 3 and etc…

is there anything to do with my query?

vsr1 · October 3, 2022, 5:01am

The results are expected.
You don’t have any predicate on query that means it using primary index.
You mentioned doc id’s 1 to 10M (as string). primary index is based on document key. The keys index is sorted.While retrieving it gives in sorted order.

If you need 1,2,3 etc best option will be prefix the keys with leading 0
i.e 000000001, 000000002, 000000003, …

Karthikeyan · October 3, 2022, 5:40am

hey @vsr1 thanks for the answer,

I have few questions to ask you,

are there any batch policies in Java SDK, using which I can query all the records in a collection batch by batch, as I’m not able to query all 10m records at one go using java-sdk
or is there any query using which I can query documents 1000 by 1000 , till 10M

thanks in advance

dh · October 3, 2022, 8:16am

You may want to review:

and implement pagination using OFFSET and LIMIT SQL clauses.

HTH.

vsr1 · October 3, 2022, 12:52pm

In real use case (where document keys might be different) you can follow as @dh suggested.

In your case you mentioned document keys are 1 to 10M numbers as string. i.e. You already know what the document keys and not filtering anything. So you can generate document keys on the sequence number based on your batches and directly use KV read and avoid query service all together.

Also 10M getting via query can be expensive. You can get document keys via query and use SDK get actual documents from KV.

SELECT RAW META(doc).id
FROM `src` .scp1.col1 as doc 
WHERE META().id > $id
LIMIT 10000;

Starts $id = “”
next iteration give last id

Topic		Replies	Views
Performance Issue using Java SDK - N1Q1 Queries SQL++ java , n1ql	3	857	March 29, 2020
Performance of single vs multiple n1ql queries for small data sets SQL++	6	2405	August 23, 2017
Strange "delay" on index, when using simple N1QL query SQL++ query , n1ql	8	2295	June 14, 2016
Query on a document key range Couchbase Server query , n1ql	3	1957	June 21, 2021
N1QL query Performance is getting low irrespective of iterations SQL++ n1ql	11	2718	January 29, 2016

Issues on querying data

Related topics