@vsr1 runing SELECT DISTINCT userId FROM default WHERE clientId=$clientId AND type='User';
returns the following error
Index scan timed out - cause: Index scan timed out
Post explain. How many entries you have matching clientId. you may have to increase indexer timeout
About 5.5 millions
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan3",
"covers": [
"cover ((`default`.`clientId`))",
"cover ((`default`.`userId`))",
"cover ((meta(`default`).`id`))"
],
"distinct": true,
"filter_covers": {
"cover ((`default`.`type`))": "User"
},
"index": "ix1",
"index_id": "26b5b6c5f9954389",
"index_projection": {
"entry_keys": [
0,
1
]
},
"keyspace": "default",
"namespace": "default",
"spans": [
{
"exact": true,
"range": [
{
"high": "\"ba2e8f22-1b7b-4407-82fa-445400fb4afd\"",
"inclusion": 3,
"low": "\"ba2e8f22-1b7b-4407-82fa-445400fb4afd\""
}
]
}
],
"using": "gsi"
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Filter",
"condition": "((cover ((`default`.`clientId`)) = \"ba2e8f22-1b7b-4407-82fa-445400fb4afd\") and (cover ((`default`.`type`)) = \"User\"))"
},
{
"#operator": "InitialProject",
"distinct": true,
"result_terms": [
{
"expr": "cover ((`default`.`userId`))"
}
]
},
{
"#operator": "Distinct"
},
{
"#operator": "FinalProject"
}
]
}
},
{
"#operator": "Distinct"
}
]
},
"text": "SELECT DISTINCT userId FROM default WHERE clientId='ba2e8f22-1b7b-4407-82fa-445400fb4afd' AND type='User'"
}
Plan is optimized. Given clientId 5.5 million qualifies? Then you have to increase the Indexer time out.
@vsr1 , is there any tricks to get result faster? I need to send notifications less than a minutes
You can use JavaSDK aynchrouns operation see if that helps. Once you get first userId you can use second query and return results.
If you can handle DISTINCT in application, you can use OFFSET and LIMIT
OR use keyset pagination
CREATE INDEX ix1 ON default( clientId, userId ) WHERE type=‘User’;
SELECT DISTINCT userId FROM default WHERE clientId=$clientId AND type=‘User’ AND userId > $userId ORDER BY userId LIMIT 1000;
start userId with “” and repeat query until now more rows with last userId as input to next one As described here https://blog.couchbase.com/offset-keyset-pagination-n1ql-query-couchbase/
@vsr1 I am using Streaming rows in nodejs client https://docs.couchbase.com/nodejs-sdk/2.6/n1ql-queries-with-sdk.html#streaming-rows
I have error event with code 1080
1080
Timeout [setting] exceeded
Timeout was exceeded.
@brett19 How can I increate timeout in nodejs?
Using userId > $userId
, must I use ORDER BY userId
to guarantee that I iterate all users?
you can use ORDER BY userId if it uses index i mentioned it will be no-op.
@vsr1 What happens if I dont use ORDER BY
, I think may be I get a userId mode than one time and never get other userId
User ORDER BY and let query decide and use index order.
How can I increate timeout in nodejs?
@vsr1 How you create index for the following query (Please explain why , I have question about ORDER BY and !=)
SELECT a,b FROM default
WHERE
type='MyType' AND
c=$c AND
d!=$d AND
e IN [$e1,$e2]
ORDER BY
z
@vsr1 Is it possible to cover UPDATE
too?
UPDATE default SET a='A' WHERE b='B'
UPDATE need to update whole document, not partial update. So it needs Fetch the document and can’t be covered.
d != $d is DNF transformed d < $d || d > $d. Follow the rules described here https://blog.couchbase.com/create-right-index-get-right-performance/.
UPDATE need to update whole document, not partial update. So it needs Fetch the document and can’t be covered.
Is it check CAS during update? to ensure consistency
What about ORBER BY
?
for the following query
SELECT a,b FROM default
WHERE
type='MyType' AND
c=$c AND
d!=$d AND
e IN [$e1,$e2]
ORDER BY
z
I think the best index is
CREATE INDEX idx1 bucket(c, e, d, z, a ,b) WHERE type='MyType'
I think , I must put z before a,b
, is it right? or no difference?
UPDATE CAS check done internally. If it fails it returns error and application needs to retry.
ORDER BY must follow index key order. It is okay to skipping single equality key.
CREATE INDEX idx1 bucket(c, z, e, d, a ,b) WHERE type=‘MyType’
You can always create index on empty bucket and do EXPLAIN and checkout.
I really don’t understand why
CREATE INDEX idx1 bucket(c, z, e, d, a ,b) WHERE type=‘MyType’
Is better than
CREATE INDEX idx1 bucket(c, e, d, z, a ,b) WHERE type='MyType'
Explain shows that first it index scan, then filter, then sort data, So how it can filter data in c field but what about e? You put z after c , how it can efficiently filter e?
it is not always possible to create efficient index without Cost Based optimization.
You need to findout which one better based on your data.
CREATE INDEX idx1 bucket(c, z, e, d, a ,b) WHERE type=‘MyType’
CREATE INDEX idx2 bucket(c, e, d, z, a ,b) WHERE type='MyType'
SELECT a,b FROM default
WHERE
type='MyType' AND
c=$c AND
d!=$d AND
e IN [$e1,$e2]
ORDER BY
z;
Which index is better for depends on Filter optimization or whole query optimization. How many items
qualify etc…
ix1 is better per filter optimization. If the query produces 1M items sorting may expensive.
In that situation if adding z NOT NULL predicate ix2 optimal.
In rule based optimization, You need to try different indexes and pick right one. If required provide USE INDEX hint. Cost based optimization will take care of this internally.
I really cannot understand!!
Assume this following scenario
- We have an UPDATE query that matches 1 million documents
- We are in production and have high throughput and mutations
- We execute UPDATE query , to be completed it takes 15 minutes
- As I understand from EXPLAIN , couchbase first get all documents , then mutate it , So if at fetching a document with key A has CAS1 , at 5th minutes we change it from Data service , so it have now CAS2 , at 10th minutes, query want to mutate it with CAS1 , Oh! failed and query returns error
- I must retry query , and we have go to step3 , So it is likely to failed again and again?!
On other scenario may be , couchbase fetches only doc keys and when it want update document it fetches document then mutate it immediately and fetches next document , it is better scenario than the first, but still in high mutation scenario and millions matching document , it’s not possible to complete query
What is the solution?