Couchbase 4.0 Data Modelling

mail_adwivedi · November 12, 2015, 6:57pm

I have an application with entities like User, Message and MessageFeatures. Each User can have many messages and each message has a MessageFeatures entity. Currently the relational model is expressed as:

User{
  UUID id
  String email
  ...
}
Message{
  UUID id,
  UUID userId
  String text
  ....
}
MessageFeatures{
  UUID id
  UUID messageId
  UUID userId
  PrimitiveObject feature1
  ....
  PrimitiveObject featureN
}

The most important queries are:

Get all messages for user
Get all message features for a user
Get message by uuid
Get/Update message feature by uuid
Get message feature by message uuid

Less important(can be slow) queries are like :

Get message features where user_id = someuuid and featureX = value
Get all/count user uuids for which featureX = value
update message features set featureX = newValue where featureX = oldValue

While evaluating couchbase i am unable to arrive at a proper data model. I do not think putting all messages and message features for a user in a single document is a good idea because the size will keep on increasing and based on current data it will easily be in range of 4-5 MB for 2 year data. Also to maintain consistency i can update only one message feature at a time as atomicity is per document.

If i do not place them in a single document they will be scattered around the cluster and queries like get all messages/messagefeatures of a user will result in scatter and gather.

I have checked out global secondary indexes and N1QL but even if I index user_uuid field of messages it will only help in fetching the message_uuids of that user, loading all the messages will result in scatter and gather…

Is there a way to force that all messages, message features of a user_uuid get mapped to a same physical node without embedding them in the same document something like hashtags in redis.

trustanish · November 17, 2015, 6:35am

How about creating different documents for each message. Let message features be a part of the message document. Let the message UUID be the key for the respective message document. Create User UUID as a property in the message document.

Data model
//User document

User::UUID
{
type:“user”,
uuid:"",
email:""
…
}

// Message document with features as a property for the document
Message::UUID{
type:“message”,
userid: “”,
text:"",
MessageFeatures: {
UUID:"",
messageID:"“
userid:”“
feature1:”",
…
featureN":""
}
…
}

CREATE PRIMARY INDEX
CREATE Secondary Index on userid of the message document.

So all your queries as mentioned below will be using indexes
Get all messages for user – Secondary Index
Get all message features for a user – Secondary Index
Get message by uuid – Primary Index
Get/Update message feature by uuid – Primary Index
Get message feature by message uuid – Primary Index

Hope this solves the problem for you.

mail_adwivedi · November 18, 2015, 7:00am

Thanks @trustanish but Get/Update message feature by uuid will not use primary index as i will have features uuid and not message uuid.

Also will get all messages by user_id be efficient as it will result in a secondary index evaluation and than a multiget across many data nodes which will not hold data in ram.

Also will it be prudent to maintain own index like user::uuid::messageids which is a document holding all message uuids of that user… this will remove the need for secondary indexing and one can directly do a multiget.

Topic		Replies	Views
DB Design for high volume chat messages Couchbase Server	3	6411	November 28, 2015
Design users chat data Couchbase Server	8	2571	May 11, 2015
Document Modeling Couchbase Server	1	1772	July 30, 2013
Modeling a "chat" database Couchbase Lite data_modelling	4	4862	January 28, 2016
Best approach to building a Chat Application Couchbase Server	2	2662	November 24, 2015

Couchbase 4.0 Data Modelling

Related topics