I’m developing an analytics service that provides search engine functionality for some time-series data stored in a couchbase-analytics node.
The data contains a number of fields, similar to
{
"id": "5434fs",
"customerId": "customer1",
"firstName": "John",
"lastName": "Doe",
"type": "FOO",
"timestamp": 1590587636,
"history": [
{
"status": "dispatched",
"tracking": "3432432423",
"timestamp": 1590587636
},
{
"status": "recieved",
"timestamp": 1590507636
}
]
}
The service allows users to apply filters on one or more fields, so the N1QL query looks like this
SELECT `o`.`id`,
`o`.`firstName`,
`o`.`lastName`,
`o`.`type`,
`o`.`timestamp`,
`o`.`status`,
`o`.`tracking`
FROM `orders` o
UNNEST `o`.`history` h
WHERE `o`.`customerId` = $`customerId`
AND `h`.`status` = 'dispatched'
AND (IS_NULL($type) OR `o`.`type` = $type)
AND (IS_NULL($firstName) OR LOWER(`o`.`firstName`) = LOWER($firstName))
AND (IS_NULL($lastName) OR LOWER(`o`.`lastName`) = LOWER($lastName))
ORDER BY `o`.`timestamp` DESC
LIMIT $`limit`
OFFSET $`offset`
To start off with we added an index on pr
.customerId
and pr
.timestamp
CREATE INDEX idx_customerId_timestamp ON `orders`(
`customerId`:STRING,
`timestamp`:BIGINT
)
The performance of the query on our Couchbase Enterprise Edition 6.0.4 build 3082 node wasn’t good enough for what we want to achieve so we have been trying to optimize it with little success so far.
We tried creating an index that covered all the parameters
CREATE INDEX idx_all_params ON `orders`(
`customerId`:STRING,
`timestamp`:BIGINT,
`type`:STRING,
`firstName`:STRING,
`lastName`: STRING
)
But it resulted in some odd behaviour where queries would not return results with a timestamp before the idx_all_params
index was created. From researching indexes a bit more I’ve learned that the use of LOWER means that indexes can’t be used on those fields, so we plan to make all the data lower case. It also doesn’t appear to be possible to index the unnest, so again we plan to refactor the data to remove it.
Are we on the right track here? Are we best creating single indexes that cover all parameters, or should we make multiple smaller indexes that can be used in parrallel?