State of the art
Relationships between documents are lost in NoSQL databases and the most common use cases (I know) is to embed documents inside others. For example a group
may contain many users
:
group::1
{
"name": "group 1",
"users": [
{
"name": "user 1",
...
},
{
"name": "user 2",
...
},
...
]
}
This structure brings unwanted problems, such as updating a user causes updating the group containing him and consequently all the other users. IMHO this is a bad technique in use cases like this because each document is independent from the others, so there should be a document for each entity:
group::1
{
"name": "group 1"
}
user::1
{
"name": "user 1"
}
user::2
{
"name": "user 2"
}
Problem
Relationships are lost in document oriented databases, how can we overcome this missing?
Is there a way to map relations in NoSQL?
Graph Theory vs Document Oriented Database (DOD)
In a graph there are two kind of entities: vertices and edges. As shown in this picture (thanks to Wikipedia):
Vertices (nodes) are the points of a graph and relations are connections between points. I think that a DOD is very similar to a graph and it contains documents
, that are vertices. By the way, there aren’t edges in a DOD. I think this is a real missing, and DOD can’t use graph properties, for example you can’t imagine to run Dijkstra or Prim algorithms on a DOD.
Solution
By creating a relation document we can map a graph inside a DOD, and by using Couchbase views we can have a fully operating graph of documents! Here are the relation documents containing keys related to the previous example:
relation::uuid
["group::1", "user::1"]
relation::uuid
["group::1", "user::2"]
Here is the view needed to manage binary undirected relations:
function (doc, meta) {
// This if is only needed if relations are in the same bucket of documents
if(Array.isArray(doc)){
if(doc.length===2){
emit(doc[0], doc[1]);
emit(doc[1], doc[0]);
}
}
}
And finally all relations are inside the view:
"user::1" -> "group::1"
"group::1" -> "user::1"
"user::2" -> "group::1"
"group::1" -> "user::2"
So we can get all documents related to the user, and with little effort also all the document of a given type related to another document.
We can also extend the concept of traditional relations, for example this situation:
that is mapped with this documents:
// Documents
group::1 { ... }
group::2 { ... }
user::1 { ... }
user::2 { ... }
role::admin { ... }
role::user { ... }
// Relations
["group::1", "user::1"]
["group::2", "user::1"]
["group::2", "user::2"]
["role::admin", "user::1"]
["role::user", "user::2"]
doesn’t let you manage a user that can have different roles for your groups. Imagine user::2
that needs to admin group::1
but its user cannot have the global role role::admin
. In relational databases you probably have to change structure. With this approach you can simply add a relation of an other kind:
by adding this kind of relation:
["group::1", "role::admin", "user::2"]
Of course, you have to change your application logic, but this isn’t avoidable! And you must manage Relational Integrity by yourself.
Conclusion
From the tests I’ve done it works well managing document relations and can open a door to all the graph theory operations. Relations are independent from documents, so can be created between each couple of documents. Relations aren’t related to document type and can contain a different number of documents (for example three documents – this extends the concept of graph, as shown above).
Last, but not least, with a bit programming effort we can have joins between entities mapped with multiple MultiGet
and views usage, so joins could be quite in RAM operations (IMHO, but this should be tested).
At the moment I need it (and so I’ve tested it) only for binary relations, but it can be used to manage n-ary relationships or directed relationships.
- What do you think about this vision of Couchbase?
- Have you tried something similar in your work? If yes, how does it
work? - Why this is a bad solution? (I’m interested in all the possible problems, more than benefits, because data integrity is the primary objective)