I have an application with entities like User, Message and MessageFeatures. Each User can have many messages and each message has a MessageFeatures entity. Currently the relational model is expressed as:
User{
UUID id
String email
...
}
Message{
UUID id,
UUID userId
String text
....
}
MessageFeatures{
UUID id
UUID messageId
UUID userId
PrimitiveObject feature1
....
PrimitiveObject featureN
}
The most important queries are:
- Get all messages for user
- Get all message features for a user
- Get message by uuid
- Get/Update message feature by uuid
- Get message feature by message uuid
Less important(can be slow) queries are like :
- Get message features where user_id = someuuid and featureX = value
- Get all/count user uuids for which featureX = value
- update message features set featureX = newValue where featureX = oldValue
While evaluating couchbase i am unable to arrive at a proper data model. I do not think putting all messages and message features for a user in a single document is a good idea because the size will keep on increasing and based on current data it will easily be in range of 4-5 MB for 2 year data. Also to maintain consistency i can update only one message feature at a time as atomicity is per document.
If i do not place them in a single document they will be scattered around the cluster and queries like get all messages/messagefeatures of a user will result in scatter and gather.
I have checked out global secondary indexes and N1QL but even if I index user_uuid field of messages it will only help in fetching the message_uuids of that user, loading all the messages will result in scatter and gather…
Is there a way to force that all messages, message features of a user_uuid get mapped to a same physical node without embedding them in the same document something like hashtags in redis.