[4.5 beta] n1ql performance with where in [array] is slow

moon0326 · June 1, 2016, 6:27pm

Hi,

I’m trying to implement a user list with followers in n1ql. I already have it working with views, but trying to see if I can get better performance with n1ql.

Desired output:

user A: 3
user B: 5
user C: 10

Current document structure

{
     "entityType": "activity",
     "target": "user",
     "target_id": "a user id here"
     "action": "follow"
}

Index

create index idx_activity_follow on mybucket (entityType, target, target_id, action);

Getting # of followers for a single user works fine. It takes under 10ms.

select count(target_id) from mybucket where entityType='activity' and target='user' and action='follow' and target_id='user_id'

But getting # of followers for multiple users takes long time. It takes over a minute and eventually displays index scan time out.

select count(target_id) from mybucket where entityType='activity' and target='user' and action='follow' and target_id in ['user_id', 'user_id2']

Explain

{
    "requestID": "c802177d-235b-475e-b8bf-27280a64a56c",
    "signature": "json",
    "results": [
        {
            "plan": {
                "#operator": "Sequence",
                "~children": [
                    {
                        "#operator": "IndexScan",
                        "covers": [
                            "cover ((`mybucket`.`entityType`))",
                            "cover ((`mybucket`.`target`))",
                            "cover ((`mybucket`.`target_id`))",
                            "cover ((`mybucket`.`action`))",
                            "cover ((meta(`mybucket`).`id`))"
                        ],
                        "index": "idx_activity_follow",
                        "index_id": "e88a694859a8a70a",
                        "keyspace": "mybucket",
                        "namespace": "default",
                        "spans": [
                            {
                                "Range": {
                                    "High": [
                                        "successor(\"activity\")"
                                    ],
                                    "Inclusion": 1,
                                    "Low": [
                                        "\"activity\""
                                    ]
                                }
                            }
                        ],
                        "using": "gsi"
                    },
                    {
                        "#operator": "Parallel",
                        "~child": {
                            "#operator": "Sequence",
                            "~children": [
                                {
                                    "#operator": "Filter",
                                    "condition": "((((cover ((`mybucket`.`entityType`)) = \"activity\") and (cover ((`mybucket`.`target`)) = \"user\")) and (cover ((`mybucket`.`action`)) = \"follow\")) and (cover ((`mybucket`.`target_id`)) in [\"kakin\"]))"
                                },
                                {
                                    "#operator": "InitialGroup",
                                    "aggregates": [
                                        "count(cover ((`mybucket`.`target_id`)))"
                                    ],
                                    "group_keys": []
                                }
                            ]
                        }
                    },
                    {
                        "#operator": "IntermediateGroup",
                        "aggregates": [
                            "count(cover ((`mybucket`.`target_id`)))"
                        ],
                        "group_keys": []
                    },
                    {
                        "#operator": "FinalGroup",
                        "aggregates": [
                            "count(cover ((`mybucket`.`target_id`)))"
                        ],
                        "group_keys": []
                    },
                    {
                        "#operator": "Parallel",
                        "~child": {
                            "#operator": "Sequence",
                            "~children": [
                                {
                                    "#operator": "InitialProject",
                                    "result_terms": [
                                        {
                                            "expr": "count(cover ((`mybucket`.`target_id`)))"
                                        }
                                    ]
                                },
                                {
                                    "#operator": "FinalProject"
                                }
                            ]
                        }
                    }
                ]
            },
            "text": "select count(target_id) from mybucket where entityType='activity' and target='user' and action='follow' and target_id in ['kakin']"
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "4.180504ms",
        "executionTime": "4.143132ms",
        "resultCount": 1,
        "resultSize": 3935
    }
}

Is there a way to improve it?

geraldss · June 1, 2016, 8:02pm

For starters, target_id should be the first key in your index. For the other index keys, put the more selective keys first, and the least selective keys last. Your index should be:

create index idx_activity_follow on mybucket (target_id, ( other keys in decreasing order of selectivity ) );

This query will be very fast in 4.5 GA, because it will use an IndexCountScan. Not sure what version you are running on. But please try the above index first.

Thanks.

moon0326 · June 1, 2016, 9:02pm

Hi,

I’m on 4.5 beta.

I’ve created a new index as you suggested. Some of the query works just fine, but some get index timeout. I’m trying to find out what’s causing it. I will report back.

I also totally forgot that I need to group by target_id in order get # of followers for each user.

Is there a way to download 4.5 GA? I only see 4.5.0-BETA on http://www.couchbase.com/nosql-databases/downloads

Thanks,
Moon

prasad · June 2, 2016, 8:57am

Hi Moon, what is the OS/version? I will send you the GA release candidate bits. Btw, how many documents do you have? Can you also check indexer/query logs for any error messages (for the index timeout issue)…
-Prasad

moon0326 · June 2, 2016, 9:36am

Hi @prasad,

We will be running production version on linux, but I’m testing 4.5 on my mac os for now.

I don’t see any errors at the moment.

Thank you for the reply

moon0326 · June 2, 2016, 5:10pm

I’m still waiting for a link to GA from prasad, but I found something very strange.

The following query returns super fast.

select count(target_id) from mybucket where target_id in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)

The following query hang.
select count(target_id) from mybucket where target_id in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16, 17)

Whenever I pass 16+ ids, it simply hangs. Both of them have the same explain output.

geraldss · June 2, 2016, 5:16pm

Hi @moon0326,

That is fixed in GA. And I think you meant square brackets instead of parentheses.

moon0326 · June 2, 2016, 5:17pm

Good to know!

Oh yeah brackets.

I will try GA once I get a link to it.

Thanks!
Moon

Topic		Replies	Views
N1QL query taking more time to execute SQL++	6	823	October 24, 2018
N1QL Count() slow with WHERE SQL++ n1ql	15	10529	October 24, 2016
N1ql array index performance SQL++	10	1684	December 2, 2016
Slow N1QL performance on covered queries SQL++ query , n1ql , index	5	2109	August 8, 2017
N1ql - cover index is way slower than expected SQL++	2	1127	May 24, 2017

[4.5 beta] n1ql performance with where in [array] is slow

Related topics