I might recommend looking to correlate from one of the timeouts to messages around there on com.couchbase.client.core. As @subhashni mentioned, seeing a TimeoutException is typically an effect, not the cause. The cause may be anything from a networking issue to a slow query to a JVM that is busy to an OS virtual machine that is oversubscribed.
One thing you may want to consider is enabling the in-client metrics collection. It could be that you have a number of fast operations, but some outliers. Some of these outliers may be so high that they’re going over the 5sec you’re allowing.
For example:
int networkMetricsInterval = 10;
int runtimeMetricsInterval = 30;
LatencyMetricsCollectorConfig latencyConfig = networkMetricsInterval <= 0
? DefaultLatencyMetricsCollectorConfig.disabled()
: DefaultLatencyMetricsCollectorConfig
.builder()
.emitFrequency(networkMetricsInterval)
.emitFrequencyUnit(TimeUnit.SECONDS)
.build();
MetricsCollectorConfig runtimeConfig = runtimeMetricsInterval <= 0
? DefaultMetricsCollectorConfig.disabled()
: DefaultMetricsCollectorConfig.create(runtimeMetricsInterval, TimeUnit.SECONDS);
DefaultCouchbaseEnvironment.Builder builder = DefaultCouchbaseEnvironment
.builder()
.runtimeMetricsCollectorConfig(runtimeConfig)
.networkLatencyMetricsCollectorConfig(latencyConfig);
Cluster cluster = CouchbaseCluster.create(builder.build(), "10.112.150.101");
Bucket bucket = cluster.openBucket("travel-sample");
That’ll output lines like this every 10s:
INFO: {"10.112.150.101":{"QUERY":{"GenericQueryRequest":{"SUCCESS":{"metrics":{"percentiles":{"50.0":52953,"90.0":106954,"95.0":126353,"99.0":272629,"99.9":666894},"min":37748,"max":666894,"count":241,"timeUnit":"MICROSECONDS"}}}},"BINARY":{"KeepAliveRequest":{"SUCCESS":{"metrics":{"percentiles":{"50.0":448,"90.0":479,"95.0":524,"99.0":524,"99.9":524},"min":382,"max":524,"count":7,"timeUnit":"MICROSECONDS"}}},"GetBucketConfigRequest":{"SUCCESS":{"metrics":{"percentiles":{"50.0":1228,"90.0":3063,"95.0":3063,"99.0":3063,"99.9":3063},"min":602,"max":3063,"count":5,"timeUnit":"MICROSECONDS"}}}},"VIEW":{"KeepAliveRequest":{"SUCCESS":{"metrics":{"percentiles":{"50.0":1122,"90.0":1613,"95.0":1695,"99.0":1695,"99.9":1695},"min":737,"max":1695,"count":10,"timeUnit":"MICROSECONDS"}}}}},"10.112.150.103":{"QUERY":{"GenericQueryRequest":{"SUCCESS":{"metrics":{"percentiles":{"50.0":48758,"90.0":89128,"95.0":116391,"99.0":182452,"99.9":1543503},"min":38535,"max":1543503,"count":240,"timeUnit":"MICROSECONDS"}}}},"BINARY":{"KeepAliveRequest":{"SUCCESS":{"metrics":{"percentiles":{"50.0":1179,"90.0":1384,"95.0":1384,"99.0":1384,"99.9":1384},"min":391,"max":1384,"count":4,"timeUnit":"MICROSECONDS"}}},"GetBucketConfigRequest":{"SUCCESS":{"metrics":{"percentiles":{"50.0":790,"90.0":1564,"95.0":1564,"99.0":1769,"99.9":1769},"min":479,"max":1769,"count":14,"timeUnit":"MICROSECONDS"}}}},"VIEW":{"KeepAliveRequest":{"SUCCESS":{"metrics":{"percentiles":{"50.0":897,"90.0":5144,"95.0":7438,"99.0":7438,"99.9":7438},"min":733,"max":7438,"count":10,"timeUnit":"MICROSECONDS"}}}}},"10.112.150.102":{"QUERY":{"GenericQueryRequest":{"SUCCESS":{"metrics":{"percentiles":{"50.0":53477,"90.0":103809,"95.0":121634,"99.0":173015,"99.9":3405774},"min":40108,"max":3405774,"count":241,"timeUnit":"MICROSECONDS"}}}},"BINARY":{"KeepAliveRequest":{"SUCCESS":{"metrics":{"percentiles":{"50.0":518,"90.0":2572,"95.0":2572,"99.0":2572,"99.9":2572},"min":378,"max":2572,"count":5,"timeUnit":"MICROSECONDS"}}},"GetBucketConfigRequest":{"SUCCESS":{"metrics":{"percentiles":{"50.0":1245,"90.0":5308,"95.0":19529,"99.0":19529,"99.9":19529},"min":456,"max":19529,"count":10,"timeUnit":"MICROSECONDS"}}}},"VIEW":{"KeepAliveRequest":{"SUCCESS":{"metrics":{"percentiles":{"50.0":1064,"90.0":2834,"95.0":3096,"99.0":3096,"99.9":3096},"min":638,"max":3096,"count":10,"timeUnit":"MICROSECONDS"}}}}},"event":{"name":"NetworkLatencyMetrics","type":"METRIC"}}
Note that in mine, some are very fast, but there are some outliers. If you see some getting up to the seconds in the 99th or 95th percentile, you may need to see if you can correlate to JVM/VM activities, etc.:
"percentiles":{"50.0":52953,"90.0":106954,"95.0":126353,"99.0":272629,"99.9":666894}```
My environment would have the console showing from 50ms to 3.5s, all one query type:
<img src="//cdck-file-uploads-global.s3.dualstack.us-west-2.amazonaws.com/couchbase/original/2X/e/eb851ef40f3575f15678d8c04200aa0c152ef603.jpg" width="690" height="376">
The RuntimeMetricsEvent can help you spot GCs, though there are better profilers for that kind of thing.