Issues on querying data

I have few issues while querying data from couchbase collection.
I have 10M data with doc id’s from 1 to 10M
I fired a select query - SELECT *,META(doc).id as docId FROM src.scp1.col1 as doc;

  • here I get only partial documents,
    for example, I have 10M records in my cluster , and I get the items queried as

doc 1
doc 10
doc 100
doc 1000
doc 10000

and so on , I don’t get the doc 2, 3 and etc…

is there anything to do with my query?

The results are expected.
You don’t have any predicate on query that means it using primary index.
You mentioned doc id’s 1 to 10M (as string). primary index is based on document key. The keys index is sorted.While retrieving it gives in sorted order.

If you need 1,2,3 etc best option will be prefix the keys with leading 0
i.e 000000001, 000000002, 000000003, …

hey @vsr1 thanks for the answer,

I have few questions to ask you,

  • are there any batch policies in Java SDK, using which I can query all the records in a collection batch by batch, as I’m not able to query all 10m records at one go using java-sdk

  • or is there any query using which I can query documents 1000 by 1000 , till 10M

thanks in advance

You may want to review:

and implement pagination using OFFSET and LIMIT SQL clauses.

HTH.

In real use case (where document keys might be different) you can follow as @dh suggested.

In your case you mentioned document keys are 1 to 10M numbers as string. i.e. You already know what the document keys and not filtering anything. So you can generate document keys on the sequence number based on your batches and directly use KV read and avoid query service all together.

Also 10M getting via query can be expensive. You can get document keys via query and use SDK get actual documents from KV.

SELECT RAW META(doc).id
FROM `src` .scp1.col1 as doc 
WHERE META().id > $id
LIMIT 10000;

Starts $id = “”
next iteration give last id