Hi there,
I’m seeing extremely inconsistent results when trying to retrieve values for a key from my views. Basically, a view request can take anywhere from 40ms to 2s. It doesn’t matter if it’s a new request or the same one, each time I run the query, the time to complete the request is totally unpredictable.
I have approximately 1.5 M records, on a cluster of 3 EC2 instances running 2.0. I’m using ‘stale=ok’ to insure that I’m not forcing a reindex. Get/Set performance is fantastic- I only have issues when attempting to retrieve values from views.
I have tried this using both libcouchbase with the couch node.js project, and with direct HTTP requests; the performance is equivalent.
Any ideas as to what would cause these kinds of issues? What tools would you recommend for identifying bottlenecks?
It is really strange because stale=ok is documented as “the fastest response times to a given query, since the existing index will be used without being updated”, neither before, nor after query.
Maybe you have other queries to the same view at that times, which force re-indexing?
Hi, thanks for the response…
My application is still in a dev environment so I have complete control over what queries are hitting the cluster. In this case, my query is the only one at a given time. These is verifiable when looking at the admin console, as I do see slight spike in activity after making a query, wheres indicators are pretty much flat otherwise. Any suggestions for diagnostic steps? It would be nice if I could understand exactly what’s happening behind the scenes when I make a query.
I’ve noticed that my views seems to be reindexing more or less continuously. Does that indicate an underlying problem? No new data is flowing in, so it seems unnecessary.
Are you injecting/updating data all the time or you dataset if “fixed” when you run your queries? (just trying to debug the stuff, because with stake=ok, you should not have any issue)
Remember that the query is executed in a “scatter/gather” approach so all nodes of your cluster are called and the query will be as fast as the slowest machine in your cluster, so, have you noticed anything bad at the VM/EC2 level ?
Hi Tug,
Thanks for the response… Indeed, as I dig into it more, it does seem that the issue is with our views, particularly their tendency to get fairly large. The debug parameter is helpful, but do you have a more detailed explanation of what the specific error codes mean? Our problematic view alternates between ‘too_large_btree_state’ and ‘function_clause’ error codes once there are sufficient records in the system (it’s also odd that the error codes aren’t consistent). With a small number of records, everything seems fine. I realized that we might be hitting the 64kb limit for objects in our reduce, and I’ve done some work to make sure that the internal objects used in the reduce function get compacted if they grow too large (basically, we are doing some aggregation that requires a map, and if it includes too many records, I lop off some chunk of it). However, these measures don’t seem to make a difference- not sure if garbage collection can happen during a reduce function, so perhaps there is no real way to trim my map object?
Any insight would be greatly appreciated.