N1ql How to use a GSI when doing a join

Hello Couchbase Experts.

I have created a GSI Index but I have no idea how to write a query where I can specify how to use this newly created index in a join using the keyword “USE INDEX [index_name] [index_type]”

http://developer.couchbase.com/documentation/server/current/n1ql/n1ql-language-reference/hints.html

The documentation only tells me how to use the "USE INDEX’ key word in a simple select statement. I’d like to know if this can be done using a JOIN NEXT CONTAINS etc…

Also if you could give me some example on how to do this.

Thanks in advance.

Hi @fernando_l,

Please post your CREATE INDEX statement, and then post your SELECT statement without the USE INDEX. We first need to understand what your are trying to do.

Hi Fernando,

You can try USE INDEX with JOIN like this (using the beer-sample data set shipped with couchbase server).

CREATE INDEX beer_brewery_id ON `beer-sample`(brewery_id) WHERE (type = “beer”);

CREATE PRIMARY INDEX beer_primary ON `beer-sample`;

  1. explain select * from `beer-sample` beer USE INDEX (beer_primary) JOIN `beer-sample` brewery on keys beer.brewery_id LIMIT 4;
    [
    {
    “plan”: {
    "#operator": “Sequence”,
    "~children": [
    {
    "#operator": “PrimaryScan”,
    “index”: “beer_primary”,
    “keyspace”: “beer-sample”,
    “namespace”: “default”,
    “using”: “gsi”

  2. explain select * from `beer-sample` beer USE INDEX (beer_brewery_id) JOIN `beer-sample` brewery on keys beer.brewery_id
    where beer.type=“beer” and brewery.type = “brewery” and beer.brewery_id LIKE “%cafe%”;
    [
    {
    “plan”: {
    "#operator": “Sequence”,
    "~children": [
    {
    "#operator": “IndexScan”,
    “index”: “beer_brewery_id”,
    “index_id”: “b5217a9b9f96f44a”,
    “keyspace”: “beer-sample”,
    “namespace”: “default”,

Note that the SELECT must follow these two important rules (loosely described) to use the index:
a) must have a predicate/condition in the where-clause that specifies the leading index keys used in the create index. In this example (2) above, it uses beer.brewery_id LIKE “%cafe%”, where brewery_id is the index-key in the index-definition of the index beer-brewery-id.

b) if the index is a partial index, then the SELECT/where-clause must also have matching predicates. Example (2) above has beer.type=“beer”.

hth,
-Prasad
PS: provide your N1QL statements, as Gerald mentioned, in case you hit issues…

@geraldss @prasad Here I give you my select query.

SELECT * FROM Session_Bucket Session
NEST Events_Bucket aryEvents ON KEYS Session.aryEvents
LEFT NEST PageRule_Bucket aryPageRules ON KEYS aryEvents[*].causeId
WHERE Session.companyId = " " AND
NOT ANY Event IN aryEvents SATISFIES
ANY rule IN aryPageRules SATISFIES Event.result=‘form-completed’
AND Event.causeId=rule.ruleId
AND rule.pageGroup=‘homedefenders.com Quote Form’
END
END AND
ANY rule IN aryPageRules SATISFIES rule.pageGroup=‘unknowns’
END AND Session.aryBreStatus.flagQ1 IS MISSING AND Session.lastEventTime < (NOW_MILLIS() - 300000)

and how the index was created.

CREATE INDEX idxCompanyId ON Session_Bucket(companyId) USING GSI;

will n1ql default this index if I do not specify it in the query?

You can create the following index:

CREATE INDEX idx_company_time ON Session_Bucket( companyId, lastEventTime );

Then in your query, change

Session.lastEventTime < (NOW_MILLIS() - 300000)

to

Session.lastEventTime < $time_param

and pass $time_param as a named parameter.

Finally, post the output of EXPLAIN here using your new index and query.

This is my explain output

[
{
"#operator": “Sequence”,
"~children": [
{
"#operator": “IndexScan”,
“index”: “fernando_idx_company_time”,
“keyspace”: “Session_Bucket”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
"“4b602472-44e6-4c22-a674-047484d588b8"”
],
“Inclusion”: 3,
“Low”: [
"“4b602472-44e6-4c22-a674-047484d588b8"”
]
}
}
],
“using”: “gsi”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Fetch”,
“as”: “Session”,
“keyspace”: “Session_Bucket”,
“namespace”: “default”
},
{
"#operator": “Nest”,
“as”: “aryEvents”,
“keyspace”: “Events_Bucket”,
“namespace”: “default”,
“on_keys”: “(Session.aryEvents)”
},
{
"#operator": “Nest”,
“as”: “aryPageRules”,
“keyspace”: “PageRule_Bucket”,
“namespace”: “default”,
“on_keys”: “(array_star(aryEvents).causeId)”,
“outer”: true
},
{
"#operator": “Filter”,
“condition”: “((((((Session.companyId) = “4b602472-44e6-4c22-a674-047484d588b8”) and (not any Event in aryEvents satisfies any rule in aryPageRules satisfies ((((Event.result) = “form-completed”) and ((Event.causeId) = (rule.ruleId))) and ((rule.pageGroup) = “homedefenders.com Quote Form”)) end end)) and any rule in aryPageRules satisfies ((rule.pageGroup) = “homedefenders.com Quote Form”) end) and (((Session.aryBreStatus).homedefendersQuoteBreadId) is missing)) and ((Session.lastEventTime) < (now_millis() - 300000)))”
},
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “self”,
“star”: true
}
]
},
{
"#operator": “FinalProject”
}
]
}
}
]
}
]

Thanks. But you did not use a named param. Please make that change, post the new EXPLAIN output, and tell us how long the new query takes.

I am not using the sdk to run this query I am using the workbench. Is there a increase in performance when using a named parameter?

Yes, significantly in this case. Use the cbq shell, and you can use a named parameter. I will ask my colleague @isha to help you with this.

Hi @fernando_l,

When using the cbq shell, you can create a named parameter (the parameter should have a -$ prefix) using the \SET command as follows :

\SET -$time_param 30;

Then you can execute the query using the named param.

We are currently using couchbase server 4.1.1 is this feature available in this version?

In 4.1, you can use curl to pass parameters.

okay I was able to do it using curl

{
“requestID”: “981a18ff-dbdf-4dbd-b034-67fc3e330aab”,
“signature”: “json”,
“results”: [indent preformatted text by 4 spaces
{
"#operator": “Sequence”,
"~children": [
{
"#operator": “IndexScan”,
“index”: “fernando_idx_company_time”,
“keyspace”: “Session_Bucket”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
"“4b602472-44e6-4c22-a674-047484d588b8"”,
"$time"
],
“Inclusion”: 1,
“Low”: [
"“4b602472-44e6-4c22-a674-047484d588b8"”
]
}
}
],
“using”: “gsi”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Fetch”,
“as”: “Session”,
“keyspace”: “Session_Bucket”,
“namespace”: “default”
},
{
"#operator": “Nest”,
“as”: “aryEvents”,
“keyspace”: “Events_Bucket”,
“namespace”: “default”,
“on_keys”: “(Session.aryEvents)”
},
{
"#operator": “Nest”,
“as”: “aryPageRules”,
“keyspace”: “PageRule_Bucket”,
“namespace”: “default”,
“on_keys”: “(array_star(aryEvents).causeId)”,
“outer”: true
},
{
"#operator": “Filter”,
“condition”: “((((((Session.companyId) = “4b602472-44e6-4c22-a674-047484d588b8”) and (not any Event in aryEvents satisfies any rule in aryPageRules satisfies ((((Event.result) = “form-completed”) and ((Event.causeId) = (rule.ruleId))) and ((rule.pageGroup) = “homedefenders.com Quote Form”)) end end)) and any rule in aryPageRules satisfies ((rule.pageGroup) = “homedefenders.com Quote Form”) end) and (((Session.aryBreStatus).homedefendersQuoteBreadId) is missing)) and ((Session.lastEventTime) \u003c $time))”
},
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “self”,
“star”: true
}
]
},
{
"#operator": “FinalProject”
}
]
}
}
]
}
],
“status”: “success”,
“metrics”: {
“elapsedTime”: “11.120239ms”,
“executionTime”: “11.068164ms”,
“resultCount”: 1,
“resultSize”: 3460
}
}

Ok, great. Now if you compute NOW_MILLIS() - 30000 on the client side, and pass that as the value of $time, you should see better performance. Is there also a minimum time you should add to your query? Right now, it gets all events from the beginning of time, up to five minutes ago.