Retrieve the entire graph document with the data documents joined into it.
Retrieve just the “nodes” field also with the data documents joined into it.
I’ve played around with N1QL and have a (slow) solution below:
SELECT graph.id, graph.name, data.data
FROM (
SELECT nodes.*
FROM default
USE KEYS "graph::1"
UNNEST nodes
WHERE default._type = "graph"
) node
JOIN default data ON KEYS node.data_ref
WHERE data._type = "data"
Would love to get some feedback on the query above on the best way to optimize it for performance and how to get #1 without two separate requests, which is how it is currently implemented. Also open to changing the schema if that makes future queries easier/faster!
SELECT graph.*, (SELECT data.* FROM data USE KEYS graph.nodes[*].data_ref) AS node_data
FROM graph;
SELECT (SELECT data.* FROM data USE KEYS graph.nodes[*].data_ref) AS node_data
FROM graph;
Some issues I ran into: The first query doesn’t quite join the “node_data” back into the “nodes” array and the second query grabs just the data documents (without any node data). I’d like to be able to have the node information alongside the data document since they’re related to each other if possible.
SELECT g.*, ARRAY_AGG({"node":n, "data":d}) AS node_data
FROM graph AS g UNNEST g.nodes AS n JOIN data AS d ON KEYS n.data_ref
GROUP BY g;
SELECT ARRAY_AGG({"node":n, "data":d}) AS node_data
FROM graph AS g UNNEST g.nodes AS n JOIN data AS d ON KEYS n.data_ref
GROUP BY g;
Should I have any particular indices to help with the query performance? Joining 375 data documents into the graph using that query took around 6 seconds.
You should use the first two queries. The data in the nodes array is also contained in the data documents, right? In that case, the first two queries should work. If not, we can take another look.
There is some data in the nodes array that is not in the data documents. I’d be open to moving those fields in the data document if that’s the more ideal way to represent this structure.
Ok. So what are your SLAs and expectations? I would also suggest using Couchbase 4.5.1 and using adequate hardware (RAM and cores). Finally, @vsr1 and @keshav_m can suggest if parallelism will help here.
Our expectations would be inline or better than what our current MySQL database currently delivers. We’re setting up a test cluster to run benchmarks on a more realistic scale.
How would we set “pretty” to false? Is this a command line option when starting the server?
You can turn off at query service level by using REST API call. Also you can turn off statement level as query parameter. I cbq shell Issue following command.
\set -pretty false;