Introduction
N1QL support for ANSI JOIN was first introduced in version 5.5. Currently there is a restriction in ANSI JOIN (version 5.5 and 6.0) in that the right-hand side of an ANSI JOIN must be a keyspace. This restriction will be lifted in version 6.5.
N1QL support for MERGE statement currently uses ON KEYS clause, similar to lookup join. ANSI MERGE syntax will be supported in version 6.5.
ANSI JOIN Enhancements
In Couchbase Server version 6.5, N1QL adds support for expression and subquery as right-hand side of an ANSI JOIN. Previously in version 5.5 and 6.0, the right-hand side of an ANSI JOIN must be a keyspace. An error is returned if the right-hand side of an ANSI JOIN is not a keyspace.
The restriction for keyspace only exists in version 5.5 and 6.0, and it only applies to the right-hand side of an ANSI JOIN. The left-hand side of an ANSI JOIN can be a keyspace, an expression, a subquery, or another join/nest/unnest operation, as long as ANSI JOIN is not mixed with a non-ANSI join/nest operation.
Expression as right-hand side of an ANSI JOIN
An expression can now be used on the right-hand side of ANSI JOIN. For example:
1 2 3 4 5 6 |
SELECT DISTINCT route.destinationairport FROM `travel-sample` airport JOIN [ {"destinationairport": "KEF", "sourceairport": "SFO", "type": "route"}, {"destinationairport": "KEF", "sourceairport": "LHR", "type": "route"} ] AS route ON airport.faa = route.sourceairport AND route.type = "route" WHERE airport.type = "airport" AND airport.city = "San Francisco"; |
In this example, an explicit array of objects is specified as an expression on the right-hand side of an ANSI JOIN. Note that when an expression is used on the right-hand side of an ANSI JOIN, the expression must have an alias (“route” in the example above, although the AS keyword is optional).
When an expression is used on the right-hand side of an ANSI JOIN, an ExpressionScan operator is used to iterate through the result set of the expression in the explain plan:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
"plan": { "#operator": "Sequence", "~children": [ { "#operator": "IndexScan3", "as": "airport", "index": "ix_airport_city", "index_id": "eee67e7e615a1b49", "index_projection": { "primary_key": true }, "keyspace": "travel-sample", "namespace": "default", "spans": [ { "exact": true, "range": [ { "high": "\"San Francisco\"", "inclusion": 3, "low": "\"San Francisco\"" } ] } ], "using": "gsi" }, { "#operator": "Fetch", "as": "airport", "keyspace": "travel-sample", "namespace": "default" }, { "#operator": "HashJoin", "build_aliases": [ "route" ], "build_exprs": [ "(`route`.`sourceairport`)" ], "on_clause": "(((`airport`.`faa`) = (`route`.`sourceairport`)) and ((`route`.`type`) = \"route\"))", "probe_exprs": [ "(`airport`.`faa`)" ], "~child": { "#operator": "Sequence", "~children": [ { "#operator": "ExpressionScan", "alias": "route", "expr": "[{\"destinationairport\": \"KEF\", \"sourceairport\": \"SFO\", \"type\": \"route\"}, {\"destinationairport\": \"KEF\", \"sourceairport\": \"LHR\", \"type\": \"route\"}]", "uncorrelated": true } ] } }, { "#operator": "Parallel", "~child": { "#operator": "Sequence", "~children": [ { "#operator": "Filter", "condition": "(((`airport`.`type`) = \"airport\") and ((`airport`.`city`) = \"San Francisco\"))" }, { "#operator": "InitialProject", "distinct": true, "result_terms": [ { "expr": "(`route`.`destinationairport`)" } ] }, { "#operator": "Distinct" } ] } }, { "#operator": "Distinct" } ] }, |
Subquery as right-hand side of an ANSI JOIN
A subquery can also be used on the right-hand side of ANSI JOIN. For example:
1 2 3 4 5 6 |
SELECT DISTINCT route.destinationairport FROM `travel-sample` airport JOIN ( SELECT destinationairport, sourceairport FROM `travel-sample` WHERE type = "route" ) AS route ON airport.faa = route.sourceairport WHERE airport.type = "airport" AND airport.city = "San Francisco"; |
In this example, a subquery is used on the right-hand side of an ANSI JOIN. Similar to an expression, when a subquery is used on the right-hand side of an ANSI JOIN, the subquery must also have an alias (“route” in the example above, although the AS keyword is optional).
When a subquery is used on the right-hand side of an ANSI JOIN, the explain shows the plan for the subquery itself:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
"plan": { "#operator": "Sequence", "~children": [ { "#operator": "IndexScan3", "as": "airport", "index": "ix_airport_city", "index_id": "eee67e7e615a1b49", "index_projection": { "primary_key": true }, "keyspace": "travel-sample", "namespace": "default", "spans": [ { "exact": true, "range": [ { "high": "\"San Francisco\"", "inclusion": 3, "low": "\"San Francisco\"" } ] } ], "using": "gsi" }, { "#operator": "Fetch", "as": "airport", "keyspace": "travel-sample", "namespace": "default" }, { "#operator": "HashJoin", "build_aliases": [ "route" ], "build_exprs": [ "(`route`.`sourceairport`)" ], "on_clause": "((`airport`.`faa`) = (`route`.`sourceairport`))", "probe_exprs": [ "(`airport`.`faa`)" ], "~child": { "#operator": "Sequence", "~children": [ { "#operator": "Sequence", "~children": [ { "#operator": "IndexScan3", "index": "ix_type", "index_id": "d925e49b3a11ae3d", "index_projection": { "primary_key": true }, "keyspace": "travel-sample", "namespace": "default", "spans": [ { "exact": true, "range": [ { "high": "\"route\"", "inclusion": 3, "low": "\"route\"" } ] } ], "using": "gsi" }, { "#operator": "Fetch", "keyspace": "travel-sample", "namespace": "default" }, { "#operator": "Parallel", "~child": { "#operator": "Sequence", "~children": [ { "#operator": "Filter", "condition": "((`travel-sample`.`type`) = \"route\")" }, { "#operator": "InitialProject", "result_terms": [ { "expr": "(`travel-sample`.`destinationairport`)" }, { "expr": "(`travel-sample`.`sourceairport`)" } ] } ] } } ] }, { "#operator": "Alias", "as": "route" } ] } }, { "#operator": "Parallel", "~child": { "#operator": "Sequence", "~children": [ { "#operator": "Filter", "condition": "(((`airport`.`type`) = \"airport\") and ((`airport`.`city`) = \"San Francisco\"))" }, { "#operator": "InitialProject", "distinct": true, "result_terms": [ { "expr": "(`route`.`destinationairport`)" } ] }, { "#operator": "Distinct" } ] } }, { "#operator": "Distinct" } ] }, |
When a subquery is used as right-hand side of an ANSI JOIN, it cannot be a correlated subquery. An error will be returned if a correlated subquery is used on the right-hand side of an ANSI JOIN.
Hash join used when an expression or a subquery is on the right-hand side of an ANSI JOIN
When a keyspace is used on the right-hand side of an ANSI JOIN, nested-loop join is the default join method, and hash join is considered when USE HASH hint is specified on the right-hand side keyspace. In addition, an appropriate secondary index on the keyspace is required when nested-loop join is used. In contrast, when an expression or a subquery is used on the right-hand side of an ANSI JOIN, since there is no index on an expression or a subquery, it is more efficient to use a hash join to perform the join. A nested-loop join in this case is effectively a cartesian join, since the result set of the expression or subquery needs to be repeatedly scanned for each outer document. As a result, when an expression or a subquery is on the right-hand side of an ANSI JOIN, hash join is the default join method, even without USE HASH hint specified. If hash join cannot be used, e.g. there is no equality join predicate, or when USE NL hint is used, or when community edition is being used (hash join is only available in enterprise edition), then nested-loop join is considered. Looking at the explain files above for cases of both expression and subquery as right-hand side of an ANSI JOIN, you can find a HashJoin operator used in both explain files.
When a keyspace is used on the right-hand side of an ANSI JOIN, a USE INDEX or USE KEYS hint can be specified on the keyspace. When an expression or a subquery is used on the right-hand side of an ANSI JOIN, you cannot specify a USE INDEX or USE KEYS hint on the expression or subquery, for obvious reasons. The only hint available in this case is a join hint (USE HASH or USE NL).
Support for ANSI NEST
The same same enhancement applies to ANSI NEST as well, i.e., you can now use an expression or a subquery as the right-hand side of an ANSI NEST.
ANSI MERGE
A MERGE statement allows documents from the “source” to be merged into the “target”, i.e., when a match is found, the target document can be updated; when a match is not found, a document can be inserted into the target.
The concept of a “match” is determined by the match condition. In previous versions of Couchbase N1QL only supports a “match” based on document key, i.e., the source must produce a document key for the target, to determine whether a match is found (whether the document with that document key already exists in the target). This is achieved by the ON KEYS clause in the MERGE statement syntax.
The MERGE statement with the ON KEYS clause is referred to as look-up merge. This is similar to the join syntax before ANSI JOIN support, i.e., look-up join, which also requires joining on document key, and uses ON KEYS clause. Therefore a natural extension of the MERGE statement is to use ANSI MERGE syntax, i.e., use ON clause to specify the merge condition, such that an arbitrary expression can be used to determine whether a match exists or not. ANSI MERGE is supported in N1QL in Couchbase server version 6.5.
Here is an example of an ANSI MERGE statement:
1 2 3 4 5 6 7 |
MERGE INTO `travel-sample` AS route USING `travel-sample` AS airport ON route.sourceairport = airport.faa AND airport.type = "airport" AND route.type = "route" WHEN MATCHED THEN UPDATE SET route.old_equipment = route.equipment, route.equipment = "797", route.updated = true WHERE route.airline = "BA" AND airport.country = "France" AND CONTAINS(route.equipment, "319"); |
In this example the merge condition is specified using an ON clause. This is very similar to the ON clause of an ANSI JOIN. In fact internally the ANSI MERGE statement uses ANSI JOIN with the same ON clause to determine whether a match exists for any source document.
The explain for the merge statement above:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
"plan": { "#operator": "Sequence", "~children": [ { "#operator": "IndexScan3", "as": "airport", "index": "ix_type", "index_id": "d925e49b3a11ae3d", "index_projection": { "primary_key": true }, "keyspace": "travel-sample", "namespace": "default", "spans": [ { "exact": true, "range": [ { "high": "\"airport\"", "inclusion": 3, "low": "\"airport\"" } ] } ], "using": "gsi" }, { "#operator": "Fetch", "as": "airport", "keyspace": "travel-sample", "namespace": "default" }, { "#operator": "Parallel", "~child": { "#operator": "Sequence", "~children": [ { "#operator": "NestedLoopJoin", "alias": "route", "on_clause": "((((`route`.`sourceairport`) = (`airport`.`faa`)) and ((`airport`.`type`) = \"airport\")) and ((`route`.`type`) = \"route\"))", "~child": { "#operator": "Sequence", "~children": [ { "#operator": "IndexScan3", "as": "route", "index": "ix_type", "index_id": "d925e49b3a11ae3d", "index_projection": { "primary_key": true }, "keyspace": "travel-sample", "namespace": "default", "nested_loop": true, "spans": [ { "exact": true, "range": [ { "high": "\"route\"", "inclusion": 3, "low": "\"route\"" } ] } ], "using": "gsi" }, { "#operator": "Fetch", "as": "route", "keyspace": "travel-sample", "namespace": "default", "nested_loop": true } ] } }, { "#operator": "Merge", "as": "route", "keyspace": "travel-sample", "namespace": "default", "update": { "#operator": "Sequence", "~children": [ { "#operator": "Filter", "condition": "((((`route`.`airline`) = \"BA\") and ((`airport`.`country`) = \"France\")) and contains((`route`.`equipment`), \"319\"))" }, { "#operator": "Clone" }, { "#operator": "Set", "set_terms": [ { "path": "(`route`.`old_equipment`)", "value": "(`route`.`equipment`)" }, { "path": "(`route`.`equipment`)", "value": "\"797\"" }, { "path": "(`route`.`updated`)", "value": "true" } ] }, { "#operator": "SendUpdate", "alias": "route", "keyspace": "travel-sample", "namespace": "default" } ] } } ] } }, { "#operator": "Discard" } ] }, |
In the explain you can see a NestedLoopJoin operator which is used for the (internally transformed) ANSI JOIN operation between the source and the target.
Join hint can be specified on the source:
1 2 3 4 5 6 7 |
MERGE INTO `travel-sample` AS route USING `travel-sample` AS airport USE HASH(build) ON route.sourceairport = airport.faa AND airport.type = "airport" AND route.type = "route" WHEN MATCHED THEN UPDATE SET route.old_equipment = route.equipment, route.equipment = "797", route.updated = true WHERE route.airline = "BA" AND airport.country = "France" AND CONTAINS(route.equipment, "319"); |
In this example USE HASH hint is specified on airport.
Handling multiple matches in ANSI MERGE
In look-up merge, since the merge condition is on document key, there is only a single potential target document match for each source document. In ANSI MERGE, since the merge condition is now an arbitrary expression, there could be potentially multiple target document matches for each source document. For each merge action specified, the merge action can only be executed at most once for each source document. In case of multiple matches, if a merge action is attempted a second time for the same source document, an error will be returned (error code 5320 or 5330). If such an error is encountered, the merge condition (ON clause) needs to be refined to avoid multiple matches for the same source document.
INSERT action for ANSI MERGE
In look-up merge, if an INSERT action is specified, only the document being inserted is provided. Since look-up merge requires a match on document key, and the INSERT action is only triggered when no match is found (i.e., no document with that document key exists in target), the newly inserted document will simply use the same document key. For ANSI MERGE, this is no longer true, since a document key is no longer required to be the match condition. As a result, when an INSERT action is specified, a document key must also be specified as part of the INSERT action. For example:
1 2 3 4 5 |
MERGE INTO `travel-sample` AS route USING `travel-sample` AS airport ON route.sourceairport = airport.faa AND airport.type = "airport" AND route.type = "route" WHEN NOT MATCHED THEN INSERT (KEY UUID(), VALUE {"sourceairport": airport.faa, "targetairport": "SFO", "type": "route"}) WHERE airport.country = "France"; |
In comparison, here is the syntax for INSERT action for look-up merge:
1 |
WHEN NOT MATCHED THEN INSERT <expr> |
where <expr> is the document to be inserted.
Here is the syntax for INSERT action for ANSI MERGE:
1 2 |
WHEN NOT MATCHED THEN INSERT (<key_expr>, <value_expr>) WHEN NOT MATCHED THEN INSERT (KEY <key_expr>, VALUE <value_expr>) |
where <key_expr> specifies the document key for the newly inserted document, and <value_expr> specifies the new document. The two are separated by a comma.
There are two forms that can be used, the KEY and VALUE keywords are optional. A new pair of parentheses is also required.
The other merge actions (UPDATE action, DELETE action) remains the same between look-up merge and ANSI MERGE.
Summary
The enhancements for ANSI JOIN and support for ANSI MERGE in Couchbase 6.5 enhances ANSI compliance of N1QL, and makes N1QL more user friendly, especially for migration from a relational database.