I have few issues while querying data from couchbase collection.
I have 10M data with doc id’s from 1 to 10M
I fired a select query - SELECT *,META(doc).id as docId FROM src.scp1.col1 as doc;
here I get only partial documents,
for example, I have 10M records in my cluster , and I get the items queried as
The results are expected.
You don’t have any predicate on query that means it using primary index.
You mentioned doc id’s 1 to 10M (as string). primary index is based on document key. The keys index is sorted.While retrieving it gives in sorted order.
If you need 1,2,3 etc best option will be prefix the keys with leading 0
i.e 000000001, 000000002, 000000003, …
are there any batch policies in Java SDK, using which I can query all the records in a collection batch by batch, as I’m not able to query all 10m records at one go using java-sdk
or is there any query using which I can query documents 1000 by 1000 , till 10M
In real use case (where document keys might be different) you can follow as @dh suggested.
In your case you mentioned document keys are 1 to 10M numbers as string. i.e. You already know what the document keys and not filtering anything. So you can generate document keys on the sequence number based on your batches and directly use KV read and avoid query service all together.
Also 10M getting via query can be expensive. You can get document keys via query and use SDK get actual documents from KV.
SELECT RAW META(doc).id
FROM `src` .scp1.col1 as doc
WHERE META().id > $id
LIMIT 10000;