I have a bucket containing 1 million small documents. What I would like to do is quickly list all the document identifiers, either in a single query or a query that can be quickly paginated. By quickly I would expect this to run in <100ms or so.
The simplistic query is SELECT meta(t).id FROM testbucket order by meta(t).id
- this takes around 4s, which seems very slow to me. Limiting to the first 1000 results is only a few microseconds, but if I use OFFSET to select a later page it takes around 4s again. Explain suggests it is an “primaryscan” - which I initially thought was an index scan, but now I am not so sure.
{
“plan”: {
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “PrimaryScan3”,
“index”: “#primary”,
“index_projection”: {
“primary_key”: true
},
“keyspace”: “testbucket”,
“namespace”: “default”,
“using”: “gsi”
},
{
“#operator”: “Fetch”,
“keyspace”: “testbucket”,
“namespace”: “default”
},
{
“#operator”: “Parallel”,
“~child”: {
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “InitialProject”,
“result_terms”: [
{
“expr”: “(meta((testbucket
.t
)).id
)”
}
]
}
]
}
}
]
},
{
“#operator”: “Order”,
“sort_terms”: [
{
“expr”: “(meta((testbucket
.t
)).id
)”
}
]
},
{
“#operator”: “FinalProject”
}
]
},
“text”: “SELECT meta(t).id FROM testbucket order by meta(t).id”
}
I found OFFSET and Keyset Pagination in N1QL Query | Couchbase which discusses this, what I am unsure about though is how to create an effective index on meta.id - as they explain shows it is an index scan anyhow I am unsure if I can improve on this?
The query I would really like to run is
SELECT meta(t).id FROM testbucket AS t where
doctype='CrystalStructure' order by meta(t).id
I tried creating the following index, but it didn’t change the speed.
CREATE INDEX DocumentTypeAndIDIndex ON testbucket(
doctype, metadata.id)
I also tried the simpler SELECT doctype FROM testbucket order by doctype
, given that I have an index on doctype alone, and that is still similarly slow.
I have wondered whether queries on meta.id are just not very optimal, and I should change my document to contain its own id so I can index and query on that - I am interested to know if anyone feels this would help?