Help for optimizing

socketman2016 · April 13, 2019, 5:21am

vsr1:

GROUP BY, ORDER BY must produce all qualified documents first before finalizing and streaming.

If you need streaming you can do following which is almost streaming
    CREATE INDEX  ix1 ON default( clientId, userId ) WHERE type='User';
    SELECT DISTINCT userId FROM  default WHERE clientId=$clientId AND type='User';
For Each User repeat the following query
  CREATE INDEX  ix2 ON default( userId , updateDate DESC, token, language) WHERE type='UserFcmToken';

  SELECT    token, language, updateDate, userId
  FROM default 
  WHERE type='UserFcmToken' AND userId = $userId 
  ORDER BY updateDate DESC
  LIMIT 100;
CE limits number of query service cores to 4.

@vsr1 runing SELECT DISTINCT userId FROM default WHERE clientId=$clientId AND type='User'; returns the following error
Index scan timed out - cause: Index scan timed out

vsr1 · April 13, 2019, 5:29am

Post explain. How many entries you have matching clientId. you may have to increase indexer timeout

socketman2016 · April 13, 2019, 5:37am

About 5.5 millions

{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "IndexScan3",
        "covers": [
          "cover ((`default`.`clientId`))",
          "cover ((`default`.`userId`))",
          "cover ((meta(`default`).`id`))"
        ],
        "distinct": true,
        "filter_covers": {
          "cover ((`default`.`type`))": "User"
        },
        "index": "ix1",
        "index_id": "26b5b6c5f9954389",
        "index_projection": {
          "entry_keys": [
            0,
            1
          ]
        },
        "keyspace": "default",
        "namespace": "default",
        "spans": [
          {
            "exact": true,
            "range": [
              {
                "high": "\"ba2e8f22-1b7b-4407-82fa-445400fb4afd\"",
                "inclusion": 3,
                "low": "\"ba2e8f22-1b7b-4407-82fa-445400fb4afd\""
              }
            ]
          }
        ],
        "using": "gsi"
      },
      {
        "#operator": "Parallel",
        "~child": {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "Filter",
              "condition": "((cover ((`default`.`clientId`)) = \"ba2e8f22-1b7b-4407-82fa-445400fb4afd\") and (cover ((`default`.`type`)) = \"User\"))"
            },
            {
              "#operator": "InitialProject",
              "distinct": true,
              "result_terms": [
                {
                  "expr": "cover ((`default`.`userId`))"
                }
              ]
            },
            {
              "#operator": "Distinct"
            },
            {
              "#operator": "FinalProject"
            }
          ]
        }
      },
      {
        "#operator": "Distinct"
      }
    ]
  },
  "text": "SELECT DISTINCT userId FROM default WHERE clientId='ba2e8f22-1b7b-4407-82fa-445400fb4afd' AND type='User'"
}

vsr1 · April 13, 2019, 5:40am

Plan is optimized. Given clientId 5.5 million qualifies? Then you have to increase the Indexer time out.

socketman2016 · April 13, 2019, 5:43am

@vsr1 , is there any tricks to get result faster? I need to send notifications less than a minutes

vsr1 · April 13, 2019, 5:46am

You can use JavaSDK aynchrouns operation see if that helps. Once you get first userId you can use second query and return results.

If you can handle DISTINCT in application, you can use OFFSET and LIMIT

OR use keyset pagination

CREATE INDEX ix1 ON default( clientId, userId ) WHERE type=‘User’;
SELECT DISTINCT userId FROM default WHERE clientId=$clientId AND type=‘User’ AND userId > $userId ORDER BY userId LIMIT 1000;
start userId with “” and repeat query until now more rows with last userId as input to next one As described here https://blog.couchbase.com/offset-keyset-pagination-n1ql-query-couchbase/

socketman2016 · April 13, 2019, 6:21am

@vsr1 I am using Streaming rows in nodejs client https://docs.couchbase.com/nodejs-sdk/2.6/n1ql-queries-with-sdk.html#streaming-rows

I have error event with code 1080

1080 Timeout [setting] exceeded Timeout was exceeded.

@brett19 How can I increate timeout in nodejs?

socketman2016 · April 13, 2019, 6:25am

Using userId > $userId , must I use ORDER BY userId to guarantee that I iterate all users?

vsr1 · April 13, 2019, 6:28am

you can use ORDER BY userId if it uses index i mentioned it will be no-op.

socketman2016 · April 13, 2019, 6:42am

@vsr1 What happens if I dont use ORDER BY , I think may be I get a userId mode than one time and never get other userId

vsr1 · April 13, 2019, 6:49am

User ORDER BY and let query decide and use index order.

socketman2016 · April 13, 2019, 6:51am

How can I increate timeout in nodejs?

socketman2016 · April 14, 2019, 5:39am

@vsr1 How you create index for the following query (Please explain why , I have question about ORDER BY and !=)

SELECT a,b FROM default 
WHERE 
  type='MyType' AND
  c=$c AND
  d!=$d AND
  e IN [$e1,$e2]
ORDER BY
  z

socketman2016 · April 14, 2019, 5:51am

@vsr1 Is it possible to cover UPDATE too?

UPDATE default SET a='A' WHERE b='B'

vsr1 · April 14, 2019, 6:17am

UPDATE need to update whole document, not partial update. So it needs Fetch the document and can’t be covered.

d != $d is DNF transformed d < $d || d > $d. Follow the rules described here https://blog.couchbase.com/create-right-index-get-right-performance/.

socketman2016 · April 14, 2019, 8:10am

UPDATE need to update whole document, not partial update. So it needs Fetch the document and can’t be covered.

Is it check CAS during update? to ensure consistency

What about ORBER BY?

for the following query

SELECT a,b FROM default 
WHERE 
  type='MyType' AND
  c=$c AND
  d!=$d AND
  e IN [$e1,$e2]
ORDER BY
  z

I think the best index is

CREATE INDEX idx1 bucket(c, e, d, z, a ,b) WHERE type='MyType'

I think , I must put z before a,b , is it right? or no difference?

vsr1 · April 14, 2019, 1:38pm

UPDATE CAS check done internally. If it fails it returns error and application needs to retry.
ORDER BY must follow index key order. It is okay to skipping single equality key.

CREATE INDEX idx1 bucket(c, z, e, d, a ,b) WHERE type=‘MyType’

You can always create index on empty bucket and do EXPLAIN and checkout.

socketman2016 · April 14, 2019, 3:35pm

I really don’t understand why

CREATE INDEX idx1 bucket(c, z, e, d, a ,b) WHERE type=‘MyType’

Is better than

CREATE INDEX idx1 bucket(c, e, d, z, a ,b) WHERE type='MyType'

Explain shows that first it index scan, then filter, then sort data, So how it can filter data in c field but what about e? You put z after c , how it can efficiently filter e?

vsr1 · April 14, 2019, 3:48pm

it is not always possible to create efficient index without Cost Based optimization.
You need to findout which one better based on your data.

CREATE INDEX idx1 bucket(c, z, e, d, a ,b) WHERE type=‘MyType’
CREATE INDEX idx2 bucket(c, e, d, z, a ,b) WHERE type='MyType'

SELECT a,b FROM default 
WHERE 
  type='MyType' AND
  c=$c AND
  d!=$d AND
  e IN [$e1,$e2]
ORDER BY
  z;

Which index is better for depends on Filter optimization or whole query optimization. How many items
qualify etc…
ix1 is better per filter optimization. If the query produces 1M items sorting may expensive.
In that situation if adding z NOT NULL predicate ix2 optimal.

In rule based optimization, You need to try different indexes and pick right one. If required provide USE INDEX hint. Cost based optimization will take care of this internally.

socketman2016 · April 16, 2019, 4:35am

I really cannot understand!!
Assume this following scenario

We have an UPDATE query that matches 1 million documents
We are in production and have high throughput and mutations
We execute UPDATE query , to be completed it takes 15 minutes
As I understand from EXPLAIN , couchbase first get all documents , then mutate it , So if at fetching a document with key A has CAS1 , at 5th minutes we change it from Data service , so it have now CAS2 , at 10th minutes, query want to mutate it with CAS1 , Oh! failed and query returns error
I must retry query , and we have go to step3 , So it is likely to failed again and again?!

On other scenario may be , couchbase fetches only doc keys and when it want update document it fetches document then mutate it immediately and fetches next document , it is better scenario than the first, but still in high mutation scenario and millions matching document , it’s not possible to complete query

What is the solution?

Topic		Replies	Views
Help me understand indexing and querying SQL++	6	1811	May 5, 2017
Best Index for the Aggregate and GroupBy query SQL++	2	747	November 7, 2019
Need help for the array query SQL++ array-query	10	1204	January 19, 2021
Best index for GROUP BY query? SQL++	1	3169	December 9, 2015
Create index optimize SQL++ query	15	594	February 1, 2024

Help for optimizing

Related topics