Indexing not working as expected

mustanish · December 7, 2017, 11:33am

Hi,

Following is the document which I get after running explain on one of the query.

[
{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “DistinctScan”,
“scan”: {
"#operator": “IndexScan”,
“index”: “activity_pins”,
“index_id”: “2c66f03e9193a364”,
“keyspace”: “activity”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
"“atlas”"
],
“Inclusion”: 3,
“Low”: [
"“atlas”"
]
}
},
{
“Range”: {
“High”: [
"“review”"
],
“Inclusion”: 3,
“Low”: [
"“review”"
]
}
}
],
“using”: “gsi”
}
},
{
"#operator": “Fetch”,
“keyspace”: “activity”,
“namespace”: “default”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Unnest”,
“as”: “pin”,
“expr”: “(activity.pins)”
},
{
"#operator": “Filter”,
“condition”: “((((pin.pin_type) in [“review”, “atlas”]) and ((pin.place_id) = “MMI000”)) and ((pin.status) in [0, “0”]))”
},
{
"#operator": “InitialGroup”,
“aggregates”: [
“count((pin.pin_id))”,
“sum((pin.pin_rate))”
],
“group_keys”: [
"(pin.pin_type)"
]
}
]
}
},
{
"#operator": “IntermediateGroup”,
“aggregates”: [
“count((pin.pin_id))”,
“sum((pin.pin_rate))”
],
“group_keys”: [
"(pin.pin_type)"
]
},
{
"#operator": “FinalGroup”,
“aggregates”: [
“count((pin.pin_id))”,
“sum((pin.pin_rate))”
],
“group_keys”: [
"(pin.pin_type)"
]
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “InitialProject”,
“result_terms”: [
{
“as”: “total_pin”,
“expr”: “count((pin.pin_id))”
},
{
“as”: “total_rating”,
“expr”: “sum((pin.pin_rate))”
},
{
“expr”: “(pin.pin_type)”
}
]
},
{
"#operator": “FinalProject”
}
]
}
}
]
},
“text”: “SELECT COUNT(pin.pin_id) AS total_pin,SUM(pin.pin_rate) AS total_rating, pin.pin_type FROM activity UNNEST activity.pins AS pin WHERE pin.pin_type IN[‘review’,‘atlas’] AND \npin.place_id=‘MMI000’ AND pin.status IN [0,‘0’] GROUP BY pin.pin_type;”
}
]
This query is taking some where aroung 200ms what should I do to optimize it, I have already done indexing on one of the field. what more should be done to optimize it.

vsr1 · December 7, 2017, 1:18pm

The Plan is right based on the index you have. Try the following Index.

CREATE INDEX ix1 ON activity (DISTINCT ARRAY pin.place_id FOR pin IN pins);

mustanish · December 7, 2017, 8:13pm

@vsr1 I have already applied indexing on pin.pin_type as you can see, If I apply indexing on place_id as suggested by you will both indexing will be considered for fetching data.

vsr1 · December 7, 2017, 8:29pm

It already fetching data ("#operator": “Fetch”).

When using ARRAY indexing predicates on multiple fields you can only push one of them others will be applied post scan/fetch.

If pin.pin_type has many items indexscan produce many items and require fetch all of them.
If pin.place_id is less items it will do better because indexscan do lot of elimination

You need to try which one works for your data model and create that index and drop other one or USE INDEX

mustanish · December 7, 2017, 9:12pm

@vsr1 thanks a lot that has cleared a lot things.

mustanish · December 8, 2017, 4:08pm

@vsr1 as suggested by you I applied indexing on pin_type but following query is taking 500ms

SELECT pin FROM activity UNNEST activity.pins AS pin WHERE pin.pin_type IN [“atlas”,“review”,“report”] AND pin.statusIN[1,“1”]
AND META(activity).id NOT IN [“mustanish123”] ORDER BY pin.pin_create_date DESC LIMIT 10 OFFSET 0

Explain for same is below

{
“plan”: {
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “IntersectScan”,
“scans”: [
{
“#operator”: “DistinctScan”,
“scan”: {
“#operator”: “IndexScan2”,
“index”: “#activity”,
“index_id”: “95fb99b21f9ad6f4”,
“keyspace”: “activity”,
“namespace”: “default”,
“spans”: [
{
“exact”: true,
“range”: [
{
“high”: “"mustanish123"”,
“inclusion”: 0,
“low”: “null”
}
]
},
{
“exact”: true,
“range”: [
{
“inclusion”: 0,
“low”: “"mustanish123"”
}
]
}
],
“using”: “gsi”
}
},
{
“#operator”: “DistinctScan”,
“scan”: {
“#operator”: “IndexScan2”,
“index”: “activity_pins”,
“index_id”: “2c66f03e9193a364”,
“index_projection”: {
“primary_key”: true
},
“keyspace”: “activity”,
“namespace”: “default”,
“spans”: [
{
“exact”: true,
“range”: [
{
“high”: “"atlas"”,
“inclusion”: 3,
“low”: “"atlas"”
}
]
},
{
“exact”: true,
“range”: [
{
“high”: “"report"”,
“inclusion”: 3,
“low”: “"report"”
}
]
},
{
“exact”: true,
“range”: [
{
“high”: “"review"”,
“inclusion”: 3,
“low”: “"review"”
}
]
}
],
“using”: “gsi”
}
}
]
},
{
“#operator”: “Fetch”,
“keyspace”: “activity”,
“namespace”: “default”
},
{
“#operator”: “Parallel”,
“~child”: {
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “Unnest”,
“as”: “pin”,
“expr”: “(activity.pins)”
}
]
}
},
{
“#operator”: “Parallel”,
“~child”: {
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “Filter”,
“condition”: “((((pin.pin_type) in ["atlas", "review", "report"]) and ((pin.status) in [1, "1"])) and (not ((meta(activity).id) in ["mustanish123"])))”
},
{
“#operator”: “InitialProject”,
“result_terms”: [
{
“expr”: “pin”
}
]
}
]
}
}
]
},
{
“#operator”: “Order”,
“limit”: “10”,
“sort_terms”: [
{
“desc”: true,
“expr”: “(pin.pin_create_date)”
}
]
},
{
“#operator”: “Limit”,
“expr”: “10”
},
{
“#operator”: “FinalProject”
}
]
},
“text”: “SELECT pin FROM activity UNNEST activity.pins AS pin WHERE pin.pin_type IN ["atlas","review","report"] AND pin.statusIN[1,"1"] \nAND META(activity).id NOT IN ["mustanish123"] ORDER BY pin.pin_create_date DESC LIMIT 10 OFFSET 0”
}

vsr1 · December 8, 2017, 4:36pm

It using IntersectScan. You should avoid it by dropping one of the index or USE INDEX.

Please go through https://blog.couchbase.com/optimize-n1ql-performance-using-request-profiling/ and checkout which one taking time and optimize accordingly.

mustanish · December 9, 2017, 7:59am

Hi,

Please see below json it does not have IntersectScan in it.

{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Sequence”,
"~children": [
{
"#operator": “DistinctScan”,
“scan”: {
"#operator": “IndexScan2”,
“index”: “activity_pins”,
“index_id”: “73d99617ce9c4d01”,
“index_projection”: {
“primary_key”: true
},
“keyspace”: “activity”,
“namespace”: “default”,
“spans”: [
{
“exact”: true,
“range”: [
{
“high”: ““atlas””,
“inclusion”: 3,
“low”: ““atlas””
}
]
},
{
“exact”: true,
“range”: [
{
“high”: ““report””,
“inclusion”: 3,
“low”: ““report””
}
]
},
{
“exact”: true,
“range”: [
{
“high”: ““review””,
“inclusion”: 3,
“low”: ““review””
}
]
}
],
“using”: “gsi”
}
},
{
"#operator": “Fetch”,
“keyspace”: “activity”,
“namespace”: “default”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Unnest”,
“as”: “pin”,
“expr”: “(activity.pins)”
}
]
}
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Filter”,
“condition”: “(((not ((pin.user_name) = “mustanish123”)) and ((pin.pin_type) in [“atlas”, “review”, “report”])) and ((pin.status) in [1, “1”]))”
},
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “pin”
},
{
“as”: “distance”,
“expr”: “(6371 * acos((((cos(radians(“28.549056336607507”)) * cos(radians((pin.place_lat)))) * cos((radians((pin.place_long)) - radians(“77.2689280000161”)))) + (sin(radians(“28.549056336607507”)) * sin(radians((pin.place_lat)))))))”
}
]
}
]
}
}
]
},
{
"#operator": “Order”,
“limit”: “10”,
“sort_terms”: [
{
“desc”: true,
“expr”: “(pin.pin_create_date)”
}
]
},
{
"#operator": “Limit”,
“expr”: “10”
},
{
"#operator": “FinalProject”
}
]
},
“text”: “SELECT pin,(6371*acos(cos(radians(‘28.549056336607507’))*cos(radians(pin.place_lat))cos(radians(pin.place_long)-radians(‘77.2689280000161’))+sin(radians(‘28.549056336607507’))\nsin(radians(pin.place_lat)))) AS distance FROM activity UNNEST activity.pins AS pin WHERE pin.user_name!=‘mustanish123’ AND pin.pin_type IN [“atlas”,“review”,“report”] \nAND pin.statusIN[1,“1”] ORDER BY pin.pin_create_date DESC LIMIT 10 OFFSET 0”
}

I have more 158,125 documents in this bucket and this query is taking more than 500 ms

vsr1 · December 9, 2017, 8:37am

When Query has order by it needs to produce all qualifying documents and do sort it takes time. You can try with out order by it will be very fast. Also follow above link if you need further optimizations

mustanish · December 11, 2017, 10:19pm

@vsr1 Thanks a lot that helped me solved my problem…

Topic		Replies	Views
Using Index not working correctly or am I using it wrong? SQL++ query , n1ql , server , index	12	3611	September 21, 2023
Multiple Array Search In Single Query Does Not Support Indexing on `OR` Operator SQL++ query , n1ql , index	3	1319	October 5, 2017
Index not working on Arrays SQL++	1	934	September 27, 2017
Index Required for the N1QL SQL++ query	4	976	December 28, 2017
N1QL execution plan SQL++	7	1695	September 12, 2017

Indexing not working as expected

Related topics