Index for query

rafael.felix · August 16, 2016, 12:01pm

Hello there,
I’m using Couchbase 4.1 CE, and i have a bucket with 8M documents.

I have this query
SELECT Extent3.* FROM PortalEvents as Extent1 USE KEYS [‘OperatorDocuments::365’, ‘OperatorDocuments::252’]
INNER UNNEST Extent1.documents as Extent2 INNER JOIN PortalEvents as Extent3 ON KEYS Extent2 ORDER BY Extent3.createdDate DESC LIMIT 50;

Is possible to have an index for the createdDate? Because the query is very fast is has no order, but with order is very slow.

I try creating an index:
CREATE INDEX portalevents_createdDate ON PortalEvents(createdDate) USING GSI;

But the query is not using this.

Thanks

keshav_m · August 19, 2016, 12:12am

In your case, you only for two documents which you’re unnesting.
Even in case of an index (array index), because you don’t have any other predicate, I’m not sure how much improvement you’d get. You can try to create the index with meta().id and see.

How big are the documents?

Below is an example of nested array index you can try.

You’d need to use 4.5 to exploit array indexes to create the appropriate array index.

See the example below. For multi-level indexes, you’ll have to create multi-level array index.
All these are available in 4.5.

CREATE INDEX inested ON `travel-sample`
   (DISTINCT ARRAY
		(DISTINCT ARRAY y.flight 
		 FOR y IN x.special_flights END) 
    FOR x IN schedule END) 
	WHERE type = "route" ;

===============

select * from b;

[
{
“b”: {
“session”: “d5f08f4c-37d7-4fba-ab78-045844aa98de”,
“things”: [
{
“thingId”: “1”,
“time”: 1461343192396,
“value”: “something”
},
{
“thingId”: “2”,
“time”: 1461343192396,
“value”: “23”
}
]
}
},
{
“b”: {
“session”: “d5f08f4c-37d7-4fba-ab78-045844aa98de”,
“things”: [
{
“thingId”: “3”,
“time”: 1461343774121,
“value”: “foo”
},
{
“thingId”: “2”,
“time”: 1461343775672,
“value”: “bar”
}
]
}
}
]

CREATE INDEX idxarray ON b((all (array (v.thingId) for v in things end)))

explain select v from b unnest things v where v.thingId = 1;

[
{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “DistinctScan”,
“scan”: {
"#operator": “IndexScan”,
“index”: “idxarray”,
“index_id”: “fae20d0eec22dc50”,
“keyspace”: “b”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
“1”
],
“Inclusion”: 3,
“Low”: [
“1”
]
}
}
],
“using”: “gsi”
}
},
{
"#operator": “Fetch”,
“keyspace”: “b”,
“namespace”: “default”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Unnest”,
“as”: “v”,
“expr”: “(b.things)”
}
]
}
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Filter”,
“condition”: “((v.thingId) = 1)”
},
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “v”
}
]
},
{
"#operator": “FinalProject”
}
]
}
}
]
},
“text”: “select v from b unnest things v where v.thingId = 1;”
}
]

rafael.felix · August 19, 2016, 10:40am

Hello, the documents have 12 short fields, small documents.
Let me explain more of my problem, maybe you have a better solution.

Example Document:
{
channelId = 252,
propertyId = 123
createdDate = 'some date’
more 12 fields
}

Now i have to do this query
select * from bucket where channelId = 252 and propertyId in [123,456,…] order by createdDate desc limit 50.

My problem is when i use the IN clause it takes forever (and eventually timeout).
If i search for one property only it’s very fast, but with the IN clause is very slow.

So my idea was to create documents that have a key OperatorDocuments::channelId::propertyId and this documents will store the key to every document for this channelId and propertyId.

With this i dont have to use the IN clause and i can use a join. Like i posted before.
Now i can search multiple combination with the use keys and the join, and it’s fast, but when i order by date it’s very slow.

Any ideas how can i overcome this problem?

Thanks

vsr1 · August 19, 2016, 8:26pm

In 4.1 only 16 IN clause values push to the indexer if more than 16 corresponding predicate is not pushed to indexer. Try following suggestion.

create index idx_1 on bucket(channelId, propertyId,createdDate);

select b from (select raw meta().id from bucket where channelId = 252 and propertyId in [123,456,…] order by createdDate desc limit 50) klist JOIN bucket b ON KEYS klist ORDER BY b.createdDate desc limit 50;

4.5.0 will help improve the query performance of large(upto 8192) IN clause by pushing In clause to indexer. If possible try with 4.5.0 your original query and new query.

rafael.felix · August 26, 2016, 12:53pm

Hello there, only today i have seen this message.

With the IN clause it is impossible, it takes forever.
I have that index on the bucket.
I try to create another index only with the date, but it’s never used.

This query takes 500ms without the order by:
SELECT Extent3.* FROM PortalEvents as Extent1 USE KEYS [‘OperatorDocuments::365’, ‘OperatorDocuments::252’]
INNER UNNEST Extent1.documents as Extent2 INNER JOIN PortalEvents as Extent3 ON KEYS Extent2 ORDER BY Extent3.createdDate DESC LIMIT 50;

When i set the order by it takes forever.

I try diferent index to see if makes the order by fast, but with no success.

Is there any index that you can see that can possible be used for the order by?

Thanks

Topic		Replies	Views
Index for nested data couchbase SQL++ n1ql	4	1303	September 9, 2017
Index creation on nested structures makes the query fail SQL++ query , n1ql , index	4	720	February 10, 2021
Couchbase: Create Index on Array containing Array of objects SQL++ query , n1ql	1	725	April 19, 2022
How to create index for the pair of startDate and endDate saved in the nested document Couchbase Server	3	1207	January 25, 2017
N1QL - Creating an index for nested elements Couchbase Server 40-rc	10	8884	May 18, 2016

Index for query

Related topics