Collections are a new feature in Couchbase 6.5. They let you group similar documents within each bucket, just as tables in relational databases collect similar records within each database. Collections will be fully supported in Couchbase 7.0, but you can try them right now in the Couchbase 6.5 release as a Developer Preview feature. The Demo Application already uses them.
What Are Collections?
If you are coming from the world of relational databases, you can think of collections as tables. All the documents within a couchbase collection should be of the same type, just as all the records in a relational table are of the same type. There might be a “customer” table or a “product” table in a relational schema; similarly, there might be a “customer” collection in a Couchbase bucket.
In older versions of Couchbase, data was organized like this:
- Cluster
- Bucket
- Document
- Bucket
In Couchbase 6.5, there are two more layers, like this:
- Cluster
- Bucket
- Scope
- Collection
- Document
- Collection
- Scope
- Bucket
How Are Collections Useful?
Collections are the lowest level of document organization, and directly contain documents. They are useful because they let you group documents more precisely than was possible before. Rather than dumping all different types of documents (products, orders, customers) into a single bucket and distinguishing them by a type field, you can instead create a collection for each type. And when you query, you can query against the collection, not just the whole bucket. You will also eventually be able to control access at the collection level.
Scopes are the level of organization above collections. Scopes contain collections and collections contain documents. There are different ways to use scopes, depending on what the Couchbase cluster is being used for. If it is supporting many different internal applications for a company, each application should have a scope of its own. If the cluster is being used to serve many client organizations, each running its own copy of an application, each copy should have a scope of its own. Similarly, if a cluster is being used by dev groups, perhaps for testing, the unit of allocation should be a scope. In each case, the owner can then create whatever collections they want under the scope they are assigned.
Scopes have to be unique within their buckets, and collections have to be unique within their scopes. Accordingly, the “default” bucket could contain two scopes “dev” and “prod”, each with their own “products” and “customers” collections.
Use of Collections
You can see collections and scopes being used in the latest version of the Couchbase Demo Application, here:
https://github.com/couchbaselabs/try-cb-java/tree/6.5.0-branch
This application uses an existing “travel-sample” bucket as is to let the user search for flights and hotels, but stores its own user and bookings data in collections. The structure being used is this:
- Bucket: default
- Scope: larson-travel
- Collection: users
- Collection: flights
- Scope: larson-travel
When the user creates an account, a document is created in the “users” collection. When they book flights, documents are created in the “flights” collection, and referenced in the user’s document in the “users” collection.
This design allows multiple applications to share the same bucket. If we had a second instance of the demo app, used by another travel agency, we could just create another scope (with its own “users” and “flights” collections,) and point the second instance at this scope by updating its application.properties file. The two instances would operate side by side, without interfering with each other.
Example Code
To begin with, the bucket and scope that contain user and bookings information are named in the application.properties file:
1 2 |
storage.clientorg.bucket=default storage.clientorg.scope=larson-travel |
These configuration values are picked up in the Database.java file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
@Configuration public class Database { ... @Value("${storage.clientorg.bucket}") private String clientOrgBucket; @Value("${storage.clientorg.scope}") private String clientOrgScope; ... public Bucket clientOrgBucket() { return loginCluster().bucket(clientOrgBucket); } public @Bean Scope clientOrgScope() { return clientOrgBucket().scope(clientOrgScope); } } |
In User.java, we see how a new flight is registered for the user. The scope bean, created above, is passed in. The username is the id the name the user logged in with.
1 2 |
public Result<Map<String, Object>> registerFlightForUser(final Scope scope, final String username, final JsonArray newFlights) { |
The id of the user document is the username of the user. The application knows to get it from the “users” collection, in the collection used by the application.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
String userId = username; Collection usersCollection = scope.collection(USERS_COLLECTION_NAME); Collection flightsCollection = scope.collection(FLIGHTS_COLLECTION_NAME); Optional<GetResult> userDataFetch = usersCollection.get(userId); if (!userDataFetch.isPresent()) { throw new IllegalStateException(); } JsonObject userData = userDataFetch.get().contentAsObject(); if (newFlights == null) { throw new IllegalArgumentException("No flights in payload"); } JsonArray added = JsonArray.empty(); |
The flights the user has booked are stored in the user document, in an array named “flights”.
1 2 3 4 |
JsonArray allBookedFlights = userData.getArray("flights"); if(allBookedFlights == null) { allBookedFlights = JsonArray.create(); } |
We add the new flights to the existing flights.
1 2 3 4 5 6 7 8 9 10 11 |
for (Object newFlight : newFlights) { checkFlight(newFlight); JsonObject t = ((JsonObject) newFlight); t.put("bookedon", "try-cb-java"); String flightId = UUID.randomUUID().toString(); flightsCollection.insert(flightId, t); allBookedFlights.add(flightId); added.add(t); } userData.put("flights", allBookedFlights); |
Then we store the new version of the user document.
1 2 3 4 5 6 7 8 |
usersCollection.upsert(userId, userData); JsonObject responseData = JsonObject.create() .put("added", added); return Result.of(responseData.toMap(), "Booked flight in Couchbase document " + userId); } |
Just below, we see how the flights of a user are retrieved.
1 2 |
public List<Map<String, Object>> getFlightsForUser(final Scope scope, final String username) { |
Get the user document from the “users” collection.
1 2 3 4 5 6 |
Collection users = scope.collection(USERS_COLLECTION_NAME); Optional<GetResult> doc = users.get(username); if (!doc.isPresent()) { return Collections.emptyList(); } JsonObject data = doc.get().contentAsObject(); |
Get the “flights” array from the user document. It contains a list of flight ids.
1 2 3 4 |
JsonArray flights = data.getArray("flights"); if (flights == null) { return Collections.emptyList(); } |
Retrieve each flight document from the “flights” collection” by ID.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
// The "flights" array contains flight ids. Convert them to actual objects. Collection flightsCollection = scope.collection(FLIGHTS_COLLECTION_NAME); List<Map<String, Object>> results = new ArrayList<Map<String, Object>>(); for (int i = 0; i < flights.size(); i++) { String flightId = flights.getString(i); Optional<GetResult> res = flightsCollection.get(flightId); if (!res.isPresent()) { throw new RuntimeException("Unable to retrieve flight id " + flightId); } Map<String, Object> flight = res.get().contentAsObject().toMap(); results.add(flight); } return results; } |
Changes From Earlier Code
The code for working with users and their flights was quite different in the previous version, which didn’t use collections. There, booked flights were stored directly in the user document. The user document was stored directly in the “travel-sample” table. Here is the original code for the registerFlightForUser() function.
1 2 |
public Result<Map<String, Object>> registerFlightForUser(final Bucket bucket, final String username, final JsonArray newFlights) { |
Notice the use of a prefix to mark the type of document. This isn’t necessary once collections are available.
1 2 3 4 5 6 7 8 9 10 |
JsonDocument userData = bucket.get("user::" + username); if (userData == null) { throw new IllegalStateException(); } if (newFlights == null) { throw new IllegalArgumentException("No flights in payload"); } JsonArray added = JsonArray.empty(); |
We retrieve the array of flights, which is already in the document.
1 2 3 4 5 6 7 8 9 |
JsonArray allBookedFlights = userData.content().getArray("flights"); if(allBookedFlights == null) { allBookedFlights = JsonArray.create(); } for (Object newFlight : newFlights) { checkFlight(newFlight); JsonObject t = ((JsonObject) newFlight); t.put("bookedon", "try-cb-java"); |
We add the flights to the array of booked flights.
1 2 3 4 5 |
allBookedFlights.add(t); added.add(t); } userData.content().put("flights", allBookedFlights); |
And we store the user document.
1 2 3 4 5 6 7 |
JsonDocument response = bucket.upsert(userData); JsonObject responseData = JsonObject.create() .put("added", added); return Result.of(responseData.toMap(), "Booked flight in Couchbase document " + response.id()); } |
Obviously, separating the booked flights from the user isn’t terribly compelling in a toy application. But in a production application, where we are storing many types of information of information about each user, it would make sense to store some records outside the user document, particularly any that were large or numerous or prone to changing frequently.
Scopes and Collections Documentation
To find out more about how to work with scopes and collection directly, consult this documentation, which explains the RESTful API for working with both, the relevant CLI commands, and information about collections available from cbstats.
Summary
Collections and scopes let you organize documents within a Couchbase bucket, just as tables and schemas let you organized rows within relational databases. The current Couchbase 6.5 GA release provides early, limited support for collections and scopes as a Developer Preview feature. To get started with collections and scopes, you can start working with the Java Demo Application right now.
Resources
Download
Documentation
Couchbase Collections 6.5 Documentation
Couchbase Server 6.5 Release Notes
Couchbase Server 6.5 What’s New
Blogs
Introducing Collections – Developer Preview in Couchbase Server 6.5
Hey Johan, this is a really nice feature to have specially for the applications which are implementing multi tenancy on the basis of data discriminator property. So i wanted to know one thing that either it is possible to execute the a query on multiple scopes? like for some queries, we wanted to get data from multiple scopes and for other queries, data should come back from single scope. So is it possible in couchbase 6.5 version? Thanks
Hi, Malik. It is not possible to write N1QL queries against collections in 6.5. Right now, N1QL queries cannot work against collections, only against buckets, as they have in the past. N1QL for collections is coming in 7.0, probably some time in 2020.
Once N1QL works for collections, it will be possible to write queries that span multiple scopes, just as it is now possible to write N1QL queries that span multiple buckets.
Since “All the documents within a couchbase collection should be of the same type, just as all the records in a relational table are of the same type”, does it imply that all documents within a collection should have the same fields?
Thanks