Hi i’m quite new to couchbase, but read and benchmarked a lot of possible solutions.
Currently i’m quite frustrated, since i’m not quite happy with my outcome and timing.
Maybe you can help, how i should model my data correctly.
The Problem:
I have a lot of events (100.000.000 every month - stored for 3months), which contain a lot of annotated tags (1900 unique / avg. 30 per document ).
Now i simply want to index the tags, so i can search for a tag and retrieve/paginate all results.
# sample data
{"data":{},"tags":{"tag_1":true,"tag_2":true}}
Related Solutions/ what i tried so far:
- Indexing object with dynamic keys within a document - #2 by vsr1
- Flexible Query & Indexing for Flexible JSON Model. - The Couchbase Blog
- Array Indexing | Couchbase Docs
Current Solution:
- In the current solution i create an index of the dictionary (based on the documentation)
CREATE INDEX `idx_data_tags` ON `data`.`_default`.`data`((distinct (array [(`v`.`name`), (`v`.`val`)] for `v` in OBJECT_PAIRS(`tags`) end)))
and use this to search for matches:
SELECT * FROM `data`.`_default`.`data` WHERE (ANY v IN OBJECT_PAIRS(tags) SATISFIES [v.name,v.val] = [$tag ,TRUE] END) LIMIT 1000 OFFSET 1000
Maybe someone can help a desperate soul. I tried arrarys too, but got better result with this method. If you require any more information, i’m happy to help.
Thanks
Result execution plan (click me)
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Sequence",
"~children": [
{
"#operator": "DistinctScan",
"limit": "1000",
"offset": "1000",
"scan": {
"#operator": "IndexScan3",
"bucket": "data",
"index": "idx_data_tags",
"index_id": "1eeda9ec5baa7833",
"index_projection": {
"primary_key": true
},
"keyspace": "data",
"namespace": "default",
"scope": "_default",
"spans": [
{
"exact": true,
"range": [
{
"high": "[$tag, true]",
"inclusion": 3,
"index_key": "(distinct (array [(`v`.`name`), (`v`.`val`)] for `v` in object_pairs(`tags`) end))",
"low": "[$tag, true]"
}
]
}
],
"using": "gsi"
}
},
{
"#operator": "Fetch",
"bucket": "data",
"keyspace": "data",
"namespace": "default",
"scope": "_default"
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Filter",
"condition": "any `v` in object_pairs((`data`.`tags`)) satisfies ([(`v`.`name`), (`v`.`val`)] = [$tag, true]) end"
},
{
"#operator": "InitialProject",
"result_terms": [
{
"expr": "self",
"star": true
}
]
}
]
}
}
]
},
{
"#operator": "Limit",
"expr": "1000"
}
]
}
}