I would like to create a composite index with information from two arrays in a document - one is an array of strings, the other is an array of objects. Subsequently, I would like to query using values from the string array and fetch the elements/attributes of the other object array.
You can do array indexing on single array. You can create on array index on tags and store users as whole array or fetch the document.
CREATE INDEX ix1 ON default (DISTINCT ARRAY v FOR v IN tags END);
SELECT users FROM default WHERE ANY v IN tags SATISFIES v = "Tags02" END;
Above query does Fetch
CREATE INDEX ix1 ON default (DISTINCT ARRAY v FOR v IN tags END, users);
This will covering query, if users is big it can skip indexing due to index key size limit
I am currently using an index on tags and it helps fetch the count of documents with a given tag quickly (using a query similar to the one you have mentioned)
Yes, users in my document is a fairly deeply nested object (I simplified it a bit here). Fetch on the document reduces the query performance. That is why I was looking to build an index where the specific information that I am interested in is part of the index itself.
Check this index and query make sure it is using covered index. Make sure expression in index and projection exactly same with variable name
CREATE INDEX ix12 ON default (DISTINCT ARRAY v FOR v IN tags END, ARRAY u.alias FOR u IN users.audiance END);
EXPLAIN SELECT ARRAY u.alias FOR u IN users.audiance END FROM default WHERE ANY v IN tags SATISFIES v = "Tags02" END;
CREATE INDEX ix12 ON default (DISTINCT ARRAY v FOR v IN tags END, ARRAY {u.alias,u.name} FOR u IN users.audiance END);
EXPLAIN SELECT ARRAY {u.alias,u.name} FOR u IN users.audiance END FROM default WHERE ANY v IN tags SATISFIES v = "Tags02" END;
I tried the first one. The index creation worked. However the explain plan doesn’t seem to use the index. It use a PrimaryScan (I have a #primary index temporarily) followed by a Fetch. I tried using USE INDEX with no luck.
Here is the explain plan.
[
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "PrimaryScan",
"index": "#primary",
"keyspace": "discussion",
"namespace": "default",
"using": "gsi"
},
{
"#operator": "Fetch",
"keyspace": "discussion",
"namespace": "default"
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Filter",
"condition": "any `c` in (`discussion`.`tags`) satisfies (`c` = \"abc\") end"
},
{
"#operator": "InitialProject",
"result_terms": [
{
"expr": "array (((`s`.`audience`).`author`).`alias`) for `s` in ((`discussion`.`content`).`sections`) end"
}
]
},
{
"#operator": "FinalProject"
}
]
}
}
]
},
"text": "SELECT ARRAY s.audience.author.alias FOR s IN content.sections END FROM discussion WHERE ANY c in tags SATISFIES c = \"abc\" END;"
}
]
Here is the DDL/DML that I using based on your suggestion
CREATE INDEX idx_spa
ON discussion (DISTINCT ARRAY c FOR c IN tags END, ARRAY s.audience.author.alias FOR s IN content.sections END)
WHERE type = 'social';
EXPLAIN SELECT ARRAY s.audience.author.alias FOR s IN content.sections END FROM discussion WHERE ANY c in tags SATISFIES c = "abc" END;
@vsr1 Thanks for pointing that out. It works. And the performance is better.
Given that the query selects an ARRAY each element is an array of one element. Is there a way to combine all these into a single array (so that I could functions like ARRAY_COUNT, ARRAY_SUM, ARRAY_DISTINCT etc) or to fetch the aliases as a scalar value and not a one element array so that I can apply similar functions on the result set?
There is no single element. If you want you can check ARRAY_CONCAT()
ARRAY s.audience.author.alias FOR s IN content.sections END
Makes array of each row .
If you are looking for whole query single array then you need to use ARRAY Aggregation
ARRAY_FLATTEN(ARRAY_AGG(ARRAY s.audience.author.alias FOR s IN content.sections END),2)