I have recently added some functionality to calculate stats on the fly. So I have a couple of queries that I want to run fast - and therefore, I have tried to create an index that should be the best match - but the query does not use it, so I must have misunderstood something
Here is a small sample with two queries that I thought would use the same index. First one for a specific user:
SELECT specieskey,SUM(count) count
FROM data
WHERE type='Catch' AND userkey='2124DEFEC111BA8FC1257ED20034B387'
AND specieskey IN ['12','1','2','13','25','4','49','38','6','7','41']
AND locationtype is valued
AND statslocation is valued
AND statspublic is valued
AND statstrippublic is valued
AND statstriptarget
AND year is valued
AND month is valued
GROUP BY specieskey
And then the same for all users (for comparison):
SELECT specieskey,SUM(count) count
FROM data
WHERE type='Catch' AND userkey is valued
AND specieskey IN ['12','1','2','13','25','4','49','38','6','7','41']
AND locationtype is valued
AND statslocation is valued
AND statspublic
AND statstrippublic
AND statstriptarget
AND year is valued
AND month is valued
GROUP BY specieskey
It is worth mentioning that there are some minor differences in the âvisibilityâ for the docs (for me: ALL, for everybody else: ONLY Public). There are more fields not used here (but defined for more granular searches by year, location, etc.)
I have created an index that I thought would be used for both of the above queries:
CREATE INDEX stats_catch_species_count
ON data (specieskey,count,locationtype,year)
WHERE d.type='Catch'
AND userkey is valued
AND specieskey is valued
AND locationtype is valued
AND statslocation is valued
AND statspublic is valued
AND statstrippublic is valued
AND statstriptarget is valued
AND year is valued
AND month is valued
But when the above queries run then the âExplainâ shows that they donât use this index:
The first query uses:
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan3",
"index": "def_type_user_1",
"index_id": "ef1ea0928c66ab90",
"index_projection": {
"primary_key": true
},
"keyspace": "data",
"namespace": "default",
"spans": [
{
"exact": true,
"range": [
{
"high": "\"Catch\"",
"inclusion": 3,
"low": "\"Catch\""
},
{
"high": "\"2124DEFEC111BA8FC1257ED20034B387\"",
"inclusion": 3,
"low": "\"2124DEFEC111BA8FC1257ED20034B387\""
}
]
}
],
"using": "gsi"
},
{
"#operator": "Fetch",
"keyspace": "data",
"namespace": "default"
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Filter",
"condition": "(((((((((((`data`.`type`) = \"Catch\") and ((`data`.`userkey`) = \"2124DEFEC111BA8FC1257ED20034B387\")) and ((`data`.`specieskey`) in [\"12\", \"1\", \"2\", \"13\", \"25\", \"4\", \"49\", \"38\", \"6\", \"7\", \"41\"])) and ((`data`.`locationtype`) is valued)) and ((`data`.`statslocation`) is valued)) and ((`data`.`statspublic`) is valued)) and ((`data`.`statstrippublic`) is valued)) and (`data`.`statstriptarget`)) and ((`data`.`year`) is valued)) and ((`data`.`month`) is valued))"
},
{
"#operator": "InitialGroup",
"aggregates": [
"sum((`data`.`count`))"
],
"group_keys": [
"(`data`.`specieskey`)"
]
}
]
}
},
{
"#operator": "IntermediateGroup",
"aggregates": [
"sum((`data`.`count`))"
],
"group_keys": [
"(`data`.`specieskey`)"
]
},
{
"#operator": "FinalGroup",
"aggregates": [
"sum((`data`.`count`))"
],
"group_keys": [
"(`data`.`specieskey`)"
]
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "InitialProject",
"result_terms": [
{
"expr": "(`data`.`specieskey`)"
},
{
"as": "count",
"expr": "sum((`data`.`count`))"
}
]
},
{
"#operator": "FinalProject"
}
]
}
}
]
},
"text": "SELECT specieskey,SUM(count) count \n FROM data\n WHERE type='Catch' AND userkey='2124DEFEC111BA8FC1257ED20034B387' \n AND specieskey IN ['12','1','2','13','25','4','49','38','6','7','41'] \n AND locationtype is valued \n AND statslocation is valued \n AND statspublic is valued \n AND statstrippublic is valued \n AND statstriptarget \n AND year is valued \n AND month is valued \n GROUP BY specieskey"
}
The def_type_user_1
index is defined as:
CREATE INDEX def_type_user_1 ON data(type,userkey) WHERE (userkey is valued)
In this case itâs not really a problem as the query runs in approx. 20 msâŚ
For the second query, however, this DOES become a problem. It uses the same index:
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan3",
"index": "def_type_user_1",
"index_id": "ef1ea0928c66ab90",
"index_projection": {
"primary_key": true
},
"keyspace": "data",
"namespace": "default",
"spans": [
{
"exact": true,
"range": [
{
"high": "\"Catch\"",
"inclusion": 3,
"low": "\"Catch\""
},
{
"inclusion": 0,
"low": "null"
}
]
}
],
"using": "gsi"
},
{
"#operator": "Fetch",
"keyspace": "data",
"namespace": "default"
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Filter",
"condition": "(((((((((((`data`.`type`) = \"Catch\") and ((`data`.`userkey`) is valued)) and ((`data`.`specieskey`) in [\"12\", \"1\", \"2\", \"13\", \"25\", \"4\", \"49\", \"38\", \"6\", \"7\", \"41\"])) and ((`data`.`locationtype`) is valued)) and ((`data`.`statslocation`) is valued)) and (`data`.`statspublic`)) and (`data`.`statstrippublic`)) and (`data`.`statstriptarget`)) and ((`data`.`year`) is valued)) and ((`data`.`month`) is valued))"
},
{
"#operator": "InitialGroup",
"aggregates": [
"sum((`data`.`count`))"
],
"group_keys": [
"(`data`.`specieskey`)"
]
}
]
}
},
{
"#operator": "IntermediateGroup",
"aggregates": [
"sum((`data`.`count`))"
],
"group_keys": [
"(`data`.`specieskey`)"
]
},
{
"#operator": "FinalGroup",
"aggregates": [
"sum((`data`.`count`))"
],
"group_keys": [
"(`data`.`specieskey`)"
]
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "InitialProject",
"result_terms": [
{
"expr": "(`data`.`specieskey`)"
},
{
"as": "count",
"expr": "sum((`data`.`count`))"
}
]
},
{
"#operator": "FinalProject"
}
]
}
}
]
},
"text": "SELECT specieskey,SUM(count) count \n FROM data\n WHERE type='Catch' AND userkey is valued \n AND specieskey IN ['12','1','2','13','25','4','49','38','6','7','41'] \n AND locationtype is valued \n AND statslocation is valued \n AND statspublic \n AND statstrippublic \n AND statstriptarget \n AND year is valued \n AND month is valued \n GROUP BY specieskey"
}
This query takes ~10 seconds to runâŚ
So what am I doing wrong here? I have tried to include all the fields used for the condition and for the resultset. And I also try to optimize it by naming all of the fields in the condition in the same order (just as âŚis valued) to make a closer match to the index.
Follow up question: Once this has been âfixedâ I will need to run a similar query on CB Lite (.Net) - so what would be the best index on that platform?
Iâm on Couchbase Server 6.0.0 Community Edition