I’m running Couchbase 4.5, 3 data nodes and 1 index node. We have about 200k documents in the system, and we are using memory optimized indexes. We use Couchbase Sync Gateway to replicate with mobile devices.
I’ve noticed that when a mobile device syncs, many documents are put into the ‘remaining to be indexed’ queue for my N1QL indexes. I see this happen on the indexer node dashboard. It usually takes 5 - 10 seconds before that queue gets emptied out.
I have been doing N1QL queries on the server side with the ‘request plus’ setting enabled, to guarantee consistency. So when a mobile device syncs, all other N1QL queries hang until the ‘remaining to be indexed’ queue is emptied. This makes sense given how ‘request plus’ is supposed to work.
What I don’t understand is how to improve the performance of indexing. Is it possible that I just have a N1QL index that takes a long time to compute? If so, how would I identify that? Do I need to scale up my hardware? My cpu usage and memory are not even close to capped, so what would I need to scale?
3 Likes
@ryanthompson0123, how much “Index RAM Quota” are you working with? Can you also check what is your “Max Index RAM Used %” for the index node on the dashboard.
I have a single indexer node with 16GB ram. I’ve set the Index Ram Quota to 10GB. My “Max Index RAM Used %” is holding steady at 6%. We have 3 other nodes that have both the Data and Query services running on them.
That looks reasonable. Can you check what is your “Indexer Threads” value in UI->Settings->Cluster->Index Settings->Advanced.
Also, how many indexes are being created on the index node and what is the key size of the indexed key.
Indexer Threads is set to 7 (it’s an 8 core machine).
We have a primary N1QL index and 11 other indexes. The keys being indexed are small. Usernames, updated dates, etc. Average item size ranges from 100-200 bytes.
Also, here are the charts for the last hour. At the spikes, we have 2-3 users syncing simultaneously and they each push up about 200 changes.
The workload is fairly small compared to what memory optimized indexes can handle with the hardware resources available. Can you check “avg scan latency” for each index on dashboard and see which one is taking long to respond.
The stats for the queue can take upto 5 seconds to refresh.
The worst index we have looks like this:
There are 2 others that are in the 10k range. The index command is this:
Definition: CREATE INDEX type_username_index
ON bucket
(type
,userName
) WHERE ((_sync
is not null) and (_deleted
is missing)) WITH { “defer_build”=true }
We think we can optimize this index if that will improve overall system performance. It looks like just about every document in the db is in this index, and that’s not what we want. We only need certain document types to be indexed, so we should be able to improve it quite a bit.
The scan latency is reported in nanoseconds. Even for this index the latency doesn’t look bad. I think you need to look at the longest running queries and check the explain plan to see if those are using covering index scans. Otherwise the query service will need to go to data service for extra fetches which can contribute to significant latencies.