Help with query performance in CBL

I have a requirement to get the most recent documents given a bunch of keys. Imagine a chat-room document, and chat-message documents that have a chatRoomId, a timestamp and some content:

chatRoom = {
  type: 'chat-room',
  id: '1234'
}

chatMessage = {
  type: 'chat-message',
  id: 'abcd',
  chatRoomId: '1234',
  timestamp: '2016-09-21T15:06:32.257Z',
  content: '...

I want to get the latest chatMessage for each chat-room, or for set of chat-rooms. My current solution is to have a view, indexed by chatRoomId and timestamp:

"map": function (doc) {
    if(doc.type !== 'chat-message')
      return;

    emit([doc.chatRoomId, doc.timestamp], null);
}

And to use a query with these options:

{
  descending: true,
  limit: 1,
  include_docs: true,
  startkey: [chatRoomId, {}],
  endkey: [chatRoomId],
}

The solution isn’t good though. To get the most recent message for multiple chat rooms I need to issue many queries and collect up the results. Is there something better I could try? Perhaps a reduced view?

I implemented exactly this for an unfinished chat sample app a few years ago. (It might still be around in the couchbaselabs org on Github — it was called “CouchChat” or something like that.)

The answer is to use grouping. Set the group level to 1, which will aggregate all the rows of each chatRoomID, and use a reduce function that simply returns the second component of the first key (which, if you issue a descending query, will be the largest date.)

Thanks jens - is this the project? Not quite sure what you mean by ‘grouping’, can you point em at some docs (this is all I could find)? I think I understand…

I should query like this:

{
  descending: true,
  limit: 1,
  include_docs: true,
  group_level: 1
}

?? Do I need to specify a key?

But I don’t quite follow your idea for the reduce function.

Ah, I think I’ve got it. So the reduce function looks like this:

"reduce": function (keys, values) {
      return keys[0][1];
}

And the query only has these parameters:

{
  descending: true,
  group_level: 1,
}

Which gives me a single result (the most recent) for each group. But it doesn’t return the document id, and adding include_docs=true doesn’t seem to return the documents. So do I have the then re-query to pull the documents out?

Yes, reduced queries return aggregate data so the rows are no longer associated with documents.

I think you’d need to add the docID to the value emitted by the map function; then you can return it from the reduce function together with the date.

Right, so map the doc ids, then return them…

"map": function...
        emit([doc.chatRoomId, timestamp], doc._id);
},

"reduce": function (keys, values) {
      return values[0];
}

Then bulk get the doc ids after…

One problem, I tried POSTing to the _bulk_get end point (http://lite.couchbase./mydbname/_bulk_get) but got a 404? (Using CBL 1.3)

Oh, and you asked when the reduce happens. In Couchbase server, as in CouchDB, reduced values are stored in the index B-tree to make querying faster. I couldn’t figure out a way to reproduce this without having access to the innards of the storage engine, so Couchbase Lite does the reducing as part of each query. So far it seems to be fast enough, for the smaller data sets used on mobile.

So do reduced views behave differently in CBL to couchbase? In CBL the parameters are applied when the view is queried, and the results are reduced. But in couchbase, if the views are already reduced I assume the parameters are applied to the to reduced view… or are the just ignored?

Still can’t get _bulk_get working in order to get my docs back the now that I’ve collected up the docs ids. Its 404ing. I’m currently using the _all_docs endpoint with keys=[...]&include_docs=true but its quite limited because the query string gets too long.

The output of the reduced query is identical either way. Couchbase Server / CouchDB are just pre-caching (“memoizing”) reduced values in the persistent index to speed up the computations.

I believe we forgot to implement _bulk_get in Couchbase Lite, embarrassingly. A workaround is to use an _all_docs query and provide a keys parameter with the doc IDs you want.

Thanks Jens

:frowning: Please at least update the docs. Can’t tell you how many hours I’ve wasted reading of following incorrect CB documentation.

Docs updated to remove the _bulk_get endpoint from the CBL docs. Sorry about the documentation issues you’ve encountered, we’ll try to get better at it. I know how frustrating it can be.
Change set: https://github.com/couchbaselabs/couchbase-mobile-portal/commit/b8a99afd6df7a76bc8a9213898e072cae1a61d4b

James

1 Like