Joining NoSQL Data: MongoDB Query Language vs Couchbase N1QL

One of the most frequent questions I receive when it comes to NoSQL is on the subject of joining data from multiple documents into a single query result. While this question is brought up more frequently from RDBMS developers, I also receive it from NoSQL developers.

When it comes to data joining, every database does it differently, some of which require it to be done through the application layer, rather than the database layer. We’re going to explore some data joining options between database technologies.

In this blog, we’ll be comparing the process of joining NoSQL documents using the MongoDB $lookup operator versus Couchbase‘s more intuitive N1QL query language.

The Sample Data

For this example, we’ll be basing both MongoDB and Couchbase off two sample documents. Assume we’re working with a classic order and inventory example. For inventory, our documents might look something like this:

{
    "id": "product-1",
    "type": "product",
    "name": "Pokemon Red",
    "price": 29.99
}

{

"id": "product-1",

"type": "product",

"name": "Pokemon Red",

"price": 29.99

}

While flat, the above document can properly explain one particular product. It has a unique id which will be involved during the join process. For orders, we might have a document that looks like the following:

{
    "id": "order-1",
    "type": "order",
    "products": [
        {
            "product_id": "product-1",
            "quantity": 2
        }
    ]
}

{

"id": "order-1",

"type": "order",

"products": [

{

"product_id": "product-1",

"quantity": 2

}

]

}

The goal here will be to join these two documents in a single query using both MongoDB and Couchbase. However, query language aside, these documents can always be joined via the application layer through multiple queries. This is not the result we’re after, though.

Joining Documents with MongoDB and the $lookup Operator

In recent versions of MongoDB, join queries involve a $lookup operator that is part of the aggregation queries. Per the MongoDB documentation, this operator performs as the following:

Performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing. The $lookup stage does an equality match between a field from the input documents with a field from the documents of the “joined” collection.

To use the $lookup operator, you’d have something like this:

db.collection.aggregate([
    {
       $lookup:
         {
           from: <collection to join>,
           localField: <field from the input documents>,
           foreignField: <field from the documents of the "from" collection>,
           as: <output array field>
         }
    }
])

db.collection.aggregate([

{

$lookup:

{

from: <collection to join>,

localField: <field from the input documents>,

foreignField: <field from the documents of the "from" collection>,

as: <output array field>

}

])

Now this is great, but it doesn’t work on relationships found in arrays. This means that the $lookup operation cannot join the product_id found in the products array to another document. Instead the array must be “unwound” or “unnested” first which adds extra complexity to our query:

db.orders.aggregate([
    { $unwind: "$products" },
    {
        $lookup: {
            from: "products",
            localField: "products.product_id",
            foreignField: "_id",
            as: "productObjects"
        }
    }
])

db.orders.aggregate([

{ $unwind: "$products" },

{

$lookup: {

from: "products",

localField: "products.product_id",

foreignField: "_id",

as: "productObjects"

}

])

The $unwind operator will flatten the array and then do a join on the now flat objects that were produced. The result of such query would look like this:

{
    "_id" : ObjectId("58a3869acbf64c4ace55e713"),
    "products" : {
        "product_id" : ObjectId("58a3851b2f14a900caa7a731"),
        "quantity" : 2
    },
    "productObjects" : [
        {
            "_id" : ObjectId("58a3851b2f14a900caa7a731"),
            "name" : "Pokemon Red",
            "price" : 29.99
        }
    ]
}

{

"_id" : ObjectId("58a3869acbf64c4ace55e713"),

"products" : {

"product_id" : ObjectId("58a3851b2f14a900caa7a731"),

"quantity" : 2

"productObjects" : [

{

"_id" : ObjectId("58a3851b2f14a900caa7a731"),

"name" : "Pokemon Red",

"price" : 29.99

}

]

}

Had there been more than one reference in the array, there would have been more results returned. However, what is returned isn’t very attractive. We still have the old products object and now a productsObject array. Further manipulations to the data stream needs to happen.

The productsObject array should be “unwound” and then reconstructed to how we want it. This can be accomplished by doing the following:

db.orders.aggregate([
    { $unwind: "$products" },
    {
        $lookup: {
            from: "products",
            localField: "products.product_id",
            foreignField: "_id",
            as: "productObjects"
        }
    },
    { $unwind: "$productObjects"},
    {
        $project: {
            products: {
                "quantity": "$products.quantity",
                "name": "$productObjects.name",
                "price": "$productObjects.price"
            }
        }
    }
])

db.orders.aggregate([

{ $unwind: "$products" },

{

$lookup: {

from: "products",

localField: "products.product_id",

foreignField: "_id",

as: "productObjects"

}

{ $unwind: "$productObjects"},

{

$project: {

products: {

"quantity": "$products.quantity",

"name": "$productObjects.name",

"price": "$productObjects.price"

}

])

Notice that the aggregate query is now getting more complex. After doing the join, the result is “unwound” and then the result is reconstructed using the $project operator.

At this point further manipulations to the result can be made such as grouping the results so that the products objects become a single array again. Each manipulation to the data set requires more aggregation code which can easily become messy, complicated, and difficult to read.

This is where Couchbase N1QL becomes so much more pleasant to work with.

Using Couchbase and N1QL to Join NoSQL Documents

Let’s use the same document example that we used for MongoDB. This time we’re going to write SQL queries with N1QL to get the job done.

The first thing that comes to mind might be to use a JOIN in SQL. Our query might look something like this:

SELECT orders.*, product
FROM example AS orders
JOIN example AS product ON KEYS orders.products[*].product_id
WHERE orders.type = 'order'

SELECT orders.*, product

FROM example AS orders

JOIN example AS product ON KEYS orders.products[*].product_id

WHERE orders.type = 'order'

In the above example, both the documents exist in the same Couchbase Bucket. A JOIN against document ids happens based on the product_id values found in the products array. The above query would yield results that look like this:

[
  {
    "id": "order-1",
    "product": {
      "id": "product-1",
      "name": "Pokemon Red",
      "price": 29.99,
      "type": "product"
    },
    "products": [
      {
        "product_id": "product-1",
        "quantity": 2
      }
    ],
    "type": "order"
  }
]

[

{

"id": "order-1",

"product": {

"id": "product-1",

"name": "Pokemon Red",

"price": 29.99,

"type": "product"

"products": [

{

"product_id": "product-1",

"quantity": 2

}

"type": "order"

}

]

Like with MongoDB, there will be a result for every item of the products array that matches. In fairness, while the N1QL version was easier to write, it wasn’t necessarily any more difficult than the MongoDB Query Language at this point. As we manipulate the data more, Couchbase becomes a lot easier by comparison.

For example, let’s say we wanted to clean up the results:

SELECT orders.id, orders.type, OBJECT_PUT(product, "quantity", products.quantity) AS product
FROM example AS orders
UNNEST orders.products AS products
JOIN example AS product ON KEYS products.product_id
WHERE orders.type = 'order'

SELECT orders.id, orders.type, OBJECT_PUT(product, "quantity", products.quantity) AS product

FROM example AS orders

UNNEST orders.products AS products

JOIN example AS product ON KEYS products.product_id

WHERE orders.type = 'order'

There are some major differences in what we’re doing in the above, but minor differences in how we’re doing them. Instead of joining directly on the array, we are first flattening or “unnesting” the array, like what we saw in the MongoDB $unwind operator. The join is now happening on each of the flattened results. Finally, the quantity from the original object is added to the new object.

The result to the above query would look something like this:

[
  {
    "id": "order-1",
    "product": {
      "id": "product-1",
      "name": "Pokemon Red",
      "price": 29.99,
      "quantity": 2,
      "type": "product"
    },
    "type": "order"
  }
]

[

{

"id": "order-1",

"product": {

"id": "product-1",

"name": "Pokemon Red",

"price": 29.99,

"quantity": 2,

"type": "product"

"type": "order"

}

]

Let’s say that the original products array had more than one product reference in it. Instead of returning several objects based on the JOIN criteria we saw above, it might make sense to re-pack that original array.

SELECT orders.id, orders.type, ARRAY_AGG(OBJECT_PUT(product, "quantity", products.quantity)) AS products
FROM example AS orders
UNNEST orders.products AS products
JOIN example AS product ON KEYS products.product_id
WHERE orders.type = 'order'
GROUP BY orders

SELECT orders.id, orders.type, ARRAY_AGG(OBJECT_PUT(product, "quantity", products.quantity)) AS products

FROM example AS orders

UNNEST orders.products AS products

JOIN example AS product ON KEYS products.product_id

WHERE orders.type = 'order'

GROUP BY orders

In the above query we’ve only added ARRAY_AGG and a GROUP BY, but as a result, each joined document shows up in the products array instead of the id value.

Don’t want to use an actual JOIN operator? Try using a SQL subquery instead.

Conclusion

In NoSQL, joining data is a very popular concern for developers that are seasoned RDBMS veterans. Because MongoDB is a very popular NoSQL technology, I thought it would be good to use for comparing to how Couchbase handles document joining. For light operations, $lookup operator is tolerable, but as join queries in MongoDB become more complex, you may need to take a step back. With N1QL, writing complex queries that include joining operations becomes very easy and stays easy regardless of how complex the query is.

For more information on N1QL and Couchbase, visit the Couchbase Developer Portal.

Nic Raboy, Developer Advocate, Couchbase

Platform

Self-Managed

Services

Capabilities

Why Couchbase?

Migrate to Capella

By Use Case

By Industry

By Application Need

Popular Docs

By Developer Role

Quickstart

Resource Center

About

Partnerships

Our Services

Partners: Register a Deal

Ready to register a deal with Couchbase?

Marriott

All Posts

Joining NoSQL Data: MongoDB Query Language vs Couchbase N1QL

The Sample Data

Joining Documents with MongoDB and the $lookup Operator

Using Couchbase and N1QL to Join NoSQL Documents

Conclusion

Author

Posted by Nic Raboy, Developer Advocate, Couchbase

Leave a reply Cancel reply