Couchbase 7.0 (collections, scopes, indexes) and how to update data model

In previous editions of Couchbase, we used the following steps to model data:

  1. Put everything into one bucket.
  2. To distinguish different types, add a type field to each JSON document and introduce a single index on bucket.(type).
  3. To ensure the uniqueness of certain fields for certain types, use lookup documents. To be able to query lookup documents, add type = "constraint", for = "airport" fields to lookup documents, meaning: the document is a unique constraint for the airport document type. Add index for bucket.(for).
  4. Add any other indexes into bucket level.
  5. Introduce a single primary index for bucket.

With Couchbase 7.0 and collections, we have to follow in mind:

  1. Put everything into one bucket and one specific scope per deployment environment, like development, staging, production. For one application deployment, everything will then go to bucket.development.
  2. JSON documents do not need a type field. Instead, the application checks adaptively whether a collection for that type exists, and if not, it will create the collection. It then uses that collection API from the SDK for K/V operations. Queries no longer check for type field, since the FROM clause will specify bucket.development.airport for each query. Index on type is no longer needed.
  3. Lookup documents for a type get their own collection, like airport-constraints and lookup documents will be placed there using transactions, ensuring atomicity when an airport is inserted into the database with unique constraints. Constraints are now easily queried without type index. This also means that for each type, we have 2 collections created: one stores the documents the other stores unique constraints.
  4. Add any other indexes into collection level, like bucket.development.airport.
  5. Introduce a primary index for each collection instead of bucket.

I would like to ask the community whether any of the above changes are anti-patterns or things that could be designed better for performance.

Additional questions:

  1. Are scopes really not useful to group collections together in the following way and answer questions like: "Give me all documents from bucket.scope where this = that"? In other words, when querying I noticed that it is not possible to specify bucket.scope. When the scope is specified, then the collection also must be specified. Is that correct?

Thanks!

In terms of 1 to 5 we had exactly that in mind when we designed collections - the only comment I would make is that there’s nothing that forces you to move to a collection model: if you find that the previous model suits you, that still works.

In terms of the additional questions, yes - access is only supported by collection.
There are several problems with allowing a FROM clause of the form

identifier.identifier

To answer your scope access question, the underlying data store layer currently does not allow direct access by scope: you can only allow by collection, so we wouldn’t be able to implement a bucket.scope access at the query level.
That’s not to say that the data layer might not be changed in a later version.
This said, it kind of seems to me like you are trying to achieve both a ‘collection’ and ‘type’ kind of data access at the same time, and I wonder if that kind of approach just leads to an application that doesn’t commit to either model, and therefore is more difficult to maintain?

To throw more fuel on the fire allowing identifier.identifier as a FROM clause would lead to ambiguities - there was for instance talk of allowing a query_context of the form namespace:bucket and a FROM term of the form scope.collection
This is already an issue because the query would only work if the query context was supplied in a specific form.
If we were to allow the second form (bucket.scope) we would now have a request which might query some data with one query context, query some other data with another, and fail with a third.

In order to avoid any misunderstanding of what means what - for now the safest bet is to allow one type of relative object names only, collections local to the scope in the context, and everything else has to be accessed by full path.

1 Like

@Marco_Greco thank you for your kindness in investing time going through my concerns. Although I was not explicit with the thinking on “scope-level queries”, you hit exactly the bull’s eye with the following:

seems to me like you are trying to achieve both a ‘collection’ and ‘type’ kind of data access at the same time

Let me go down a tangent for class hierarchies in applications:

Your intuition is true because usually we also have some kind of a hierarchy between documents, especially when they are mapped to application internal classes (for example JVM classes) and the application wants to maintain easy-to-maintain APIs. For example, an abstract Animal would have sub-types Cat, Dog, Mouse, all of which accessed through type-specialized APIs, like Cats.getByName() and Animals.getByName() at the same time.

So we thought that if we could have an Animals scope with query access, that would simplify some things at some internal API lawyer. I understand the ambiguities of scope-level FORM spec, thank you for clarifying it.

Question:

  1. We have UNION queries in mind for the Animal case. The Animal.getByName API simply calls getByName for all child and UNION the queries. Would that be a good approach? Are there any design suggestions around these cases?

That’s exactly what I wanted to elicit, a cool, well defined use case.
There are things that still need to be worked out - like how do you know what animal it is unless you had a type (or animal) field in every document, but it gives us something to think about.
Maybe for the next version?

In the meantime, yes a union wold work for you.

1 Like