Index on UNNEST JOIN

Hi there,

I am looking to find an index that can optimize a query with an UNNEST join, where fields and filters are from both sides of the JOIN. We currently have 15k users on the database, and the structure of one user is the following:

"user": {
  "name": "John Doe",
  "birthday": "2000-01-01",
  "active": true,
  "email": "john@doe.com",
  "type": "User",
  "stores": [{
     "status": "client"
     "storeId": 1,
     "signup": "2017-01-01"
     "special": true

   },
   {
     "status": "client"
     "storeId": 2,
     "special": false
   }]
}

Here is an example of a query:

SELECT users.name, users.email, users.otherProperties, stores.signup, stores.otherProperties
FROM default users
UNNEST users.stores as stores
WHERE users.type="User"
AND users.active=true
AND stores.status="client"
AND stores.special=true
ORDER BY stores.signup DESC
LIMIT 10

With 15 k documents, a query like this would take more than 5 seconds to run. Please note that the fields on the query, the filters and the order statement are dynamic. Is there a way of making this query perform better?

Thank you for your time,
Manuel

Check both indexes and see which performs better. The UNNEST alias needs to match with Array index variable.

CREATE INDEX ix1 ON default(DISTINCT ARRAY brands.status FOR brands IN stores END) WHERE type = "User" AND active = true;
OR
CREATE INDEX ix1 ON default(DISTINCT ARRAY brands.special FOR brands IN stores END) WHERE type = "User" AND active = true;

SELECT users.name, users.email, users.otherProperties, brands.signup
FROM default users UNNEST users.stores AS brands
WHERE users.type="User" AND users.active=true
AND brands.status="client" AND brands.special=true
ORDER BY stores.signup DESC LIMIT 10;

You can also use following technique push both the predicates to indexer when both equality predicates

CREATE INDEX ix3 ON default(DISTINCT ARRAY [brands.status,brands.active] FOR brands IN stores END) WHERE type = "User" AND active = true;

SELECT users.name, users.email, users.otherProperties, brands.signup
FROM default users UNNEST users.stores AS brands
WHERE users.type="User" AND users.active=true
AND [brands.status,brands.special] = ["client",true]
ORDER BY stores.signup DESC LIMIT 10;

Hi @vsr1, thank you for the prompt reply. I have tested the indexes, but the query still takes more than 5 seconds to run. The mentioned indexes work well for these filters, but since they are dynamic it is possible that the filters don’t exist or exist in conjunction with other filters, meaning that the index will not be as efficient (or not be used in the query at all).

For example, another possible query could be:

SELECT users.otherProperties, stores.signup
FROM default users UNNEST users.stores AS stores
WHERE users.type="User" AND users.birthday >= "1990-01-01"
AND stores.storeId=1 AND stores.signup>="2017-01-01"
ORDER BY stores.signup DESC LIMIT 10;

It seems impossible to me to write an index that could cover such a dynamic range of filters. Would it be wiser to restructure the current data model to accommodate our performance requirements?

Thanks!

EDIT: The use case for this kind of queries is a user table, that a different store administrator can access, as well as a global administrator. In this user table you can check the user information, and filter and sort a paginated result set.

It might be taking time because ORDER which needs to produce all possible results even though there is LIMIT 10.

You can create index like this. This singup and birthday, type applied during index scan produce qualified document and UNNEST further filters from array elements not qualified.

CREATE INDEX ix1 ON default(birthday, ARRAY v.signup FOR v IN stores END) WHERE type = "User";

    SELECT users.otherProperties, stores.signup
    FROM default users UNNEST users.stores AS stores
    WHERE users.type="User" AND users.birthday >= "1990-01-01"
    AND ANY v IN users.stores SATISFIES v.signup>="2017-01-01" 
    AND stores.storeId=1 AND stores.signup>="2017-01-01"
    ORDER BY stores.signup DESC LIMIT 10;

The query now took 8.5 seconds to run. Here is the EXPLAIN:

{
    "requestID": "dd8b1e3d-2e91-4103-b608-457cb8351972",
    "clientContextID": "725b88a6-4e53-451e-8cee-f9aa79dff8cf",
    "signature": "json",
    "results": [
        {
            "plan": {
                "#operator": "Sequence",
                "~children": [
                    {
                        "#operator": "Sequence",
                        "~children": [
                            {
                                "#operator": "IndexScan",
                                "index": "ix1",
                                "index_id": "1458255227e07249",
                                "keyspace": "default",
                                "namespace": "default",
                                "spans": [
                                    {
                                        "Range": {
                                            "Inclusion": 1,
                                            "Low": [
                                                "\"1990-01-01\""
                                            ]
                                        }
                                    }
                                ],
                                "using": "gsi"
                            },
                            {
                                "#operator": "Fetch",
                                "as": "users",
                                "keyspace": "default",
                                "namespace": "default"
                            },
                            {
                                "#operator": "Parallel",
                                "~child": {
                                    "#operator": "Sequence",
                                    "~children": [
                                        {
                                            "#operator": "Unnest",
                                            "as": "stores",
                                            "expr": "(`users`.`stores`)"
                                        }
                                    ]
                                }
                            },
                            {
                                "#operator": "Parallel",
                                "~child": {
                                    "#operator": "Sequence",
                                    "~children": [
                                        {
                                            "#operator": "Filter",
                                            "condition": "((((((`users`.`type`) = \"User\") and (\"1990-01-01\" <= (`users`.`birthday`))) and ((`stores`.`storeId`) = \"b44412f1-5595-44fc-a90d-0983fb60ec73\")) and any `v` in (`users`.`stores`) satisfies (\"2017-01-01\" <= (`v`.`signup`)) end) and (\"2017-01-01\" <= (`stores`.`signup`)))"
                                        },
                                        {
                                            "#operator": "InitialProject",
                                            "result_terms": [
                                                {
                                                    "expr": "(`users`.`name`)"
                                                },
                                                {
                                                    "expr": "(`stores`.`signup`)"
                                                }
                                            ]
                                        }
                                    ]
                                }
                            }
                        ]
                    },
                    {
                        "#operator": "Order",
                        "limit": "10",
                        "sort_terms": [
                            {
                                "desc": true,
                                "expr": "(`stores`.`signup`)"
                            }
                        ]
                    },
                    {
                        "#operator": "Limit",
                        "expr": "10"
                    },
                    {
                        "#operator": "FinalProject"
                    }
                ]
            },
            "text": "SELECT users.name, stores.signup\nFROM default users USE INDEX(ix1)\nUNNEST users.stores AS stores\nWHERE users.type=\"User\" \nAND users.birthday >= \"1990-01-01\"\nAND stores.storeId=\"b44412f1-5595-44fc-a90d-0983fb60ec73\"\nAND ANY v IN users.stores SATISFIES v.signup>=\"2017-01-01\" END\nAND stores.signup >=\"2017-01-01\"\nORDER BY stores.signup DESC \nLIMIT 10"
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "3.603619ms",
        "executionTime": "3.55958ms",
        "resultCount": 1,
        "resultSize": 4568
    }
}

The user document I have put here is a simplified version of the actual document. We are creating an index on the birthday, but every user field could potentially be a part of the filter. Should we index all of those fields that can be a part of the filter?

Typo DISTINCT keyword is missing

CREATE INDEX ix1 ON default(birthday, DISTINCT ARRAY v.signup FOR v IN stores END) WHERE type = “User”;

Also checkout https://developer.couchbase.com/documentation/server/current/n1ql/n1ql-language-reference/adaptive-indexing.html