GAT Operation Timeout KVTimeOut is 2.5s

jon.bick · November 28, 2023, 10:07am

For a number of months now I have been seeing exceptions logged with the following error. It is frequent to the point that our logs are unreasonable with an average hourly user count of 100 users, approximately 10 requests a second.

SDK Version 3.4.12

The GAT operation [OpaqueID]/[UniqueIdentifier] timed out after 00:00:02.5010452. It was retried 0 times using Couchbase.Core.Retry.BestEffortRetryStrategy. The KvTimeout is 00:00:02.5000000.

I found a number of similar posts but no resolution:

https://www.couchbase.com/forums/t/couchbase-core-retry-besteffortretrystrategy-timeout-error/33194

I have enabled logs and am unable to determine the cause. Example log:

2023-11-28 09:33:58,792 [9] DEBUG Couchbase.Core.ClusterNode - Executing op GAT on MACHINE03.internal.com:11210 with key 443553e9-0c06-41c0-864d-d32a8b1f4091 and opaque 81.
2023-11-28 09:33:58,792 [9] DEBUG Couchbase.KeyValue.CouchbaseCollection - Completed fetching CID for _default._default
2023-11-28 09:33:58,792 [9] DEBUG Couchbase.CouchbaseBucket - Sending op GAT with 443553e9-0c06-41c0-864d-d32a8b1f4091 to MACHINE03.internal.com:11210 using VBID: 856
2023-11-28 09:33:58,792 [9] DEBUG Couchbase.Core.ClusterNode - CB: Current state is Closed.
2023-11-28 09:33:58,792 [9] DEBUG Couchbase.Core.ClusterNode - Executing op GAT on MACHINE03.internal.com:11210 with key 443553e9-0c06-41c0-864d-d32a8b1f4091 and opaque 82.
2023-11-28 09:33:58,792 [9] DEBUG Couchbase.KeyValue.CouchbaseCollection - Completed fetching CID for _default._default
2023-11-28 09:33:58,792 [9] DEBUG Couchbase.CouchbaseBucket - Sending op GAT with 443553e9-0c06-41c0-864d-d32a8b1f4091 to MACHINE03.internal.com:11210 using VBID: 856
2023-11-28 09:33:58,792 [9] DEBUG Couchbase.Core.ClusterNode - CB: Current state is Closed.
2023-11-28 09:33:58,792 [9] DEBUG Couchbase.Core.ClusterNode - Executing op GAT on MACHINE03.internal.com:11210 with key 443553e9-0c06-41c0-864d-d32a8b1f4091 and opaque 83.
2023-11-28 09:33:58,792 [9] DEBUG Couchbase.KeyValue.CouchbaseCollection - Completed fetching CID for _default._default
2023-11-28 09:33:58,792 [9] DEBUG Couchbase.CouchbaseBucket - Sending op GAT with 443553e9-0c06-41c0-864d-d32a8b1f4091 to MACHINE03.internal.com:11210 using VBID: 856
2023-11-28 09:33:58,792 [9] DEBUG Couchbase.Core.ClusterNode - CB: Current state is Closed.
2023-11-28 09:33:58,792 [9] DEBUG Couchbase.Core.ClusterNode - Executing op GAT on MACHINE03.internal.com:11210 with key 443553e9-0c06-41c0-864d-d32a8b1f4091 and opaque 84.
2023-11-28 09:33:58,792 [9] DEBUG App.Metrics.Internal.DefaultMetricsRegistry - Adding Timer Request Timer|db.operation:GetCidByName - App.Metrics.Timer.TimerOptions req Microseconds Microseconds System.Collections.Generic.Dictionary`2[System.String,System.String]

Failure messages:

2023-11-28 09:34:01,309 [25] DEBUG Couchbase.Core.ClusterNode - KV Operation timeout for op GAT on MACHINE03.internal.com:11210 with key 443553e9-0c06-41c0-864d-d32a8b1f4091 and opaque 84. Is orphaned: True
2023-11-28 09:34:01,309 [26] DEBUG Couchbase.Core.ClusterNode - KV Operation timeout for op GAT on MACHINE03.internal.com:11210 with key 443553e9-0c06-41c0-864d-d32a8b1f4091 and opaque 83. Is orphaned: True
2023-11-28 09:34:01,309 [26] DEBUG Couchbase.Core.ClusterNode - CB: Marking a failure for 83 to MACHINE03.internal.com:11210.
2023-11-28 09:34:01,309 [25] DEBUG Couchbase.Core.ClusterNode - CB: Marking a failure for 84 to MACHINE03.internal.com:11210.

Typical success message:

2023-11-28 09:34:01,293 [22] DEBUG Couchbase.Core.ClusterNode - Completed executing op GAT on MACHINE03.internal.com:11210 with key 443553e9-0c06-41c0-864d-d32a8b1f4091 and opaque 81
2023-11-28 09:34:01,293 [24] DEBUG Couchbase.Core.ClusterNode - Completed executing op GAT on MACHINE03.internal.com:11210 with key 443553e9-0c06-41c0-864d-d32a8b1f4091 and opaque 82

mreiche · November 28, 2023, 3:41pm

This is the .NET SDK?
A timeout exception occurs when the operation takes longer than the the specified timeout. The exception can be avoided by a) increasing the specified timeout; or b) decreasing the execution time of the operation. I assume that you would prefer (b). You will need ti investigate why the operations are taking as long as they do. One possible reason is that the server is saturated. Check the server dashboard. Another tool is Observability - Slow Operations Logging | Couchbase Docs

jon.bick · November 29, 2023, 11:15am

Thanks for taking the time to respond. Initial views of the server dashboard is get ops are taking a long time, there isn’t much information stored so typically requests should be in the ms. Nothing unusual in the logs, I’ll looks into the slow operations logging in the mean time.

mreiche · November 29, 2023, 11:01pm

Looking at the logging - it looks like there are a number of requests (81, 82, 83, 84) “2023-11-28 09:33:58,792” and they are all for the same key. (Is this part of a load test?)

And while you did get responses within the timeout (barely) at “2023-11-28 09:34:01,293” (for opaque 81 and 82) the responses you received for opaque 83 and 84 at “2023-11-28 09:34:01,309” were beyond the 2.5 second timeout.

And all of the requests (least the last three) logged:

Couchbase.Core.ClusterNode - CB: Current state is Closed.

So it would seem that restablishing the connection was what took so long. If you run an application that makes a couchbase connection, then sends one request and waits for the response - how long between the request and the response? (the first request will make the actual connection). Some things that can make obtaining a connection slow is a slow response for a DNS lookup.

jon.bick · November 30, 2023, 8:26am

Not a load test, genuine usage. It is not untypical to receive multiple requests a second from the same key if they are accessing several services at once.

The logs are full of the message:

Couchbase.Core.ClusterNode - CB: Current state is Closed.

I just want to check that my implementation is correct.

Global.asax

static IWindsorContainer _container;
static ICluster _cluster;

protected void Application_Start(object sender, EventArgs e)
{
ConfigWrapper cw = new ConfigWrapper();

ILoggerFactory factory = new LoggerFactory();
factory.AddLog4Net("log4net.config");

ClusterOptions options = null;
if (cw.GetConfigValue("UseLog4net") == "Y")
{
    options = new ClusterOptions().WithConnectionString(cw.GetConfigValue("CouchBaseServer"))
        .WithCredentials(cw.GetConfigValue("CouchBaseUser"), cw.GetConfigValue("CouchBasePassword"))
        .WithLogging(factory);
}
else
{
    options = new ClusterOptions().WithConnectionString(cw.GetConfigValue("CouchBaseServer"))
        .WithCredentials(cw.GetConfigValue("CouchBaseUser"), cw.GetConfigValue("CouchBasePassword"));
}

options.EnableDnsSrvResolution = false;
options.KvConnectTimeout = TimeSpan.FromMilliseconds(12000);

_cluster = Cluster.ConnectAsync(options).Result;

// Allow 1 minute to connect before failing
_cluster.WaitUntilReadyAsync(new TimeSpan(0, 1, 0));

// Initialise Castle Windsor
_container = new WindsorContainer();
_container.AddFacility<WcfFacility>();
_container.Install(new WindsorInstaller(_cluster));
}

Windsor Installer:

var bucket = _cluster.BucketAsync(“default”).Result;
Component.For().Instance(bucket).LifestyleSingleton()

Code retrieving data:

TimeSpan timeoutValue =
TimeSpan.FromMinutes(Convert.ToDouble(_cw.GetConfigValue(“CacheTimeOutInMins”)));
var scope = _bucket.ScopeAsync(“_default”).Result;
var collection = scope.CollectionAsync(“_default”).Result;
try
{
var result = collection.GetAndTouchAsync(guid, timeoutValue).Result;
//check to region is a match before returning
var cache = result.ContentAs();
if (cache.CacheRegion == GetRegion())
{
//region match so return the data.
return result.ContentAs();
}
}

mreiche · November 30, 2023, 5:13pm

I’m not intimate with the internals of the .NET SDK - but it might be beneficial to obtain the collection at the same time as as you obtain the _cluster and reuse it.

_cluster = Cluster.ConnectAsync(options).Result;
scope = _bucket.ScopeAsync(“_default”).Result;
collection = scope.CollectionAsync(“_default”).Result;

btburnett3 · November 30, 2023, 5:37pm

If I had to make a guess, I’d say all of the use of .Result is probably the issue. It’s a known issue that using async methods from sync methods this way causes all kinds of issues in .NET. You could be experiencing thread pool depletion issues. I strongly recommend converting your methods to asynchronous methods wherever you possibly can. Unfortunately, in legacy ASP.NET you can’t get everywhere, but the more you do the better.

For the spots you can’t convert to be truly asynchronous, a library like this should be used: GitHub - CenterEdge/CenterEdge.Async: When you gotta, you gotta

jmorris · December 1, 2023, 12:33am

This just means that the circuit breaker hasn’t been tripped and the operation will be attempted. Its DEBUG level logging and can be ignored (unless its state is Open, which means the circuit was tripped and the op will be retried until it is closed again).

The bigger mystery is what is causing the timeout as a timeout is always a side effect of something else. One thing to note with Docker and other containers, its very easy to hit network saturation when run in resource constrained environments. If there are no other exceptions being thrown, this sounds like a likely reason for the timeouts - you simply cannot fit more operations onto the wire and everything slows down.

mreiche · December 1, 2023, 1:07am

I believe that ‘opaque’ starts at 0 when the Cluster.connect() is called. So it looks like at this point only 85 requests have been made (and at least the last four during the same millisecond, for the same id). And I can only guess - but I would guess - that the first 80 were also sent in the same millisecond, for the same id). I would take a close look at the calling code to ensure that it is not a non-blocking loop with the same id.

jon.bick · December 4, 2023, 8:07am

Thanks all, I was suspicious about the .Result’s for async methods so will start there.

In the code above I have specified:

options.EnableDnsSrvResolution = false;
options.KvConnectTimeout = TimeSpan.FromMilliseconds(12000);

Do these only apply when opening the Cluster?

mreiche · December 4, 2023, 7:01pm

If you run an application that makes a couchbase connection, then sends one request and waits for the response - how long between the request and the response? (the first request will make the actual connection).

jon.bick · December 5, 2023, 8:00am

Around 50-100ms. Its very quick.

jon.bick · December 5, 2023, 9:37am

Unfortunately the async changes didn’t help. They have instead masked the error as our logging says “The Operation Was Cancelled.” now. Its difficult to diagnose as my tests are returning responses almost immediately, but in the production environment we are seeing the timeouts. All environments use the same Couchbase Cluster.

btburnett3 · December 5, 2023, 3:11pm

This sounds kind of like it may be a networking issue. Have you tried running the SDK Doctor from the production environment?

mreiche · December 5, 2023, 4:15pm

Ok. At what point do the operations go from being very quick to not being very quick? From the logging of the last four requests - they are all made at the same time (for the same doc) if the previous 80 requests were also launched at the same time (and especially upon startup) then it’s not surprising that the 83rd and 84th request don’t complete in 2.5 seconds

jon.bick · December 6, 2023, 11:29am

Will setup the SDK Doctor.

@mreiche - It is odd that there are 4 requests made at the same time, possibly more I as I extracted only a portion of the logs. The use case is a method is called to retrieve the users logged in data held in couchbase upon visting a new page. Its extremely unlikely that 4 pages would have been opened by the same person at the same time etc. There are no loops in the code or anything, its very basic grab the data from couchbase and return to the front end. Maybe the connectivity monitor will pick up duplicate connections.

mreiche · December 6, 2023, 3:09pm

Its extremely unlikely that 4 pages would have been opened by the same person at the same time etc. There are no loops in the code or anything

I wonder if there was some initial, blocking delay - and maybe the user clicked several times on the same page since it was not responsive - and then all the requests went through at once. Just speculation.

It would be interesting to see the logging for the previous 80 requests. If they are all at the same time, it would explain the timeout.

jon.bick · December 7, 2023, 8:52am

Forgot to hit reply yesterday! SDK Doctor logs below, not sure how to get the analytics service working.

|====================================================================|
| ___ ___ _ __ ___ ___ ___ _____ ___ ___ |
| / | | |/ /| \ / _ \ / | / _ | _ \ |
| _ \ |) | ’ <| |) | () | (_ | || () | / |
| |//||_\ |/ _/ _| |_| __/||\ |
| |
|====================================================================|

Note: Diagnostics can only provide accurate results when your cluster
is in a stable state. Active rebalancing and other cluster configuration
changes can cause the output of the doctor to be inconsistent or in the
worst cases, completely incorrect.

15:00:15.772 INFO Parsing connection string couchbase://192.168.8.10/default
15:00:15.772 INFO Connection string identifies the following CCCP endpoints:
15:00:15.772 INFO 1. 192.168.8.10:11210
15:00:15.772 INFO Connection string identifies the following HTTP endpoints:
15:00:15.772 INFO 1. 192.168.8.10:8091
15:00:15.772 INFO Connection string specifies bucket default
15:00:15.772 WARN Your connection string specifies only a single host. You should consider adding additional static nodes from your cluster to this list to improve your applications fault-tolerance
15:00:15.772 INFO Performing DNS lookup for host 192.168.8.10
15:00:15.772 INFO Attempting to connect to cluster via CCCP
15:00:15.772 INFO Attempting to fetch config via cccp from 192.168.8.10:11210
15:00:15.779 WARN Bootstrap host 192.168.8.10 is not using the canonical node hostname of cbn02.internal.com. This is not neccessarily an error, but has been known to result in strange and challenging to diagnose errors when DNS entries are reconfigured.
15:00:15.779 INFO Selected the following network type: default
15:00:15.779 INFO Identified the following nodes:
15:00:15.779 INFO [0] cbn01.internal.com
15:00:15.779 INFO projector: 9999, capi: 8092, ftsGRPC: 9130
15:00:15.779 INFO indexScan: 9101, indexStreamInit: 9103, indexStreamMaint: 9105
15:00:15.779 INFO kv: 11210, mgmt: 8091, fts: 8094
15:00:15.779 INFO indexAdmin: 9100, indexHttp: 9102, indexStreamCatchup: 9104
15:00:15.779 INFO n1ql: 8093
15:00:15.779 INFO [1] cbn02.internal.com
15:00:15.779 INFO indexScan: 9101, indexStreamCatchup: 9104, indexStreamInit: 9103
15:00:15.779 INFO mgmt: 8091, projector: 9999, ftsGRPC: 9130
15:00:15.779 INFO fts: 8094, indexAdmin: 9100, indexHttp: 9102
15:00:15.779 INFO indexStreamMaint: 9105, kv: 11210, n1ql: 8093
15:00:15.779 INFO capi: 8092
15:00:15.779 INFO [2] cbn03.internal.com
15:00:15.779 INFO indexHttp: 9102, indexScan: 9101, indexStreamInit: 9103
15:00:15.779 INFO indexStreamMaint: 9105, projector: 9999, capi: 8092
15:00:15.779 INFO ftsGRPC: 9130, indexStreamCatchup: 9104, kv: 11210
15:00:15.779 INFO mgmt: 8091, n1ql: 8093, fts: 8094
15:00:15.779 INFO indexAdmin: 9100
15:00:15.779 INFO Fetching config from http://cbn01.internal.com:8091
15:00:15.790 INFO Received cluster configuration, nodes list:
[
{
“addressFamily”: “inet”,
“addressFamilyOnly”: false,
“clusterCompatibility”: 458753,
“clusterMembership”: “active”,
“configuredHostname”: “cbn01.internal.com:8091”,
“couchApiBase”: “http://cbn01.internal.com:8092/”,
“cpuCount”: 2,
“externalListeners”: [
{
“afamily”: “inet”,
“nodeEncryption”: false
}
],
“hostname”: “cbn01.internal.com:8091”,
“interestingStats”: {
“cmd_get”: 3.1,
“couch_docs_actual_disk_size”: 130572054,
“couch_docs_data_size”: 40093377,
“couch_spatial_data_size”: 0,
“couch_spatial_disk_size”: 0,
“couch_views_actual_disk_size”: 0,
“couch_views_data_size”: 0,
“curr_items”: 20829,
“curr_items_tot”: 63163,
“ep_bg_fetched”: 0,
“get_hits”: 2.9,
“mem_used”: 70463200,
“ops”: 5,
“vb_active_num_non_resident”: 0,
“vb_replica_curr_items”: 42334
},
“mcdMemoryAllocated”: 6385,
“mcdMemoryReserved”: 6385,
“memoryFree”: 4865429504,
“memoryTotal”: 8370176000,
“nodeEncryption”: false,
“nodeHash”: 48500460,
“nodeUUID”: “fbb46848f03f5003c65b3035e267356f”,
“os”: “x86_64-pc-linux-gnu”,
“otpNode”: “ns_1@cbn01.internal.com”,
“ports”: {
“direct”: 11210,
“distTCP”: 21100,
“distTLS”: 21150
},
“recoveryType”: “none”,
“services”: [
“fts”,
“index”,
“kv”,
“n1ql”
],
“status”: “healthy”,
“systemStats”: {
“allocstall”: 0,
“cpu_cores_available”: 2,
“cpu_stolen_rate”: 0,
“cpu_utilization_rate”: 7.217883374643918,
“mem_free”: 4865429504,
“mem_limit”: 8370176000,
“mem_total”: 8370176000,
“swap_total”: 2147479552,
“swap_used”: 1048576
},
“thisNode”: true,
“uptime”: “5023266”,
“version”: “7.1.1-3175-community”
},
{
“addressFamily”: “inet”,
“addressFamilyOnly”: false,
“clusterCompatibility”: 458753,
“clusterMembership”: “active”,
“configuredHostname”: “cbn02.internal.com:8091”,
“couchApiBase”: “http://cbn02.internal.com:8092/”,
“cpuCount”: 2,
“externalListeners”: [
{
“afamily”: “inet”,
“nodeEncryption”: false
}
],
“hostname”: “cbn02.internal.com:8091”,
“interestingStats”: {
“cmd_get”: 0.8,
“couch_docs_actual_disk_size”: 90279545,
“couch_docs_data_size”: 40065535,
“couch_spatial_data_size”: 0,
“couch_spatial_disk_size”: 0,
“couch_views_actual_disk_size”: 0,
“couch_views_data_size”: 0,
“curr_items”: 21250,
“curr_items_tot”: 63163,
“ep_bg_fetched”: 0,
“get_hits”: 0.6,
“mem_used”: 70584912,
“ops”: 0.8,
“vb_active_num_non_resident”: 0,
“vb_replica_curr_items”: 41913
},
“mcdMemoryAllocated”: 6385,
“mcdMemoryReserved”: 6385,
“memoryFree”: 4793425920,
“memoryTotal”: 8370180096,
“nodeEncryption”: false,
“nodeHash”: 26677891,
“nodeUUID”: “50635cacd0a6f8858e0ad9662554c1d0”,
“os”: “x86_64-pc-linux-gnu”,
“otpNode”: “ns_1@cbn02.internal.com”,
“ports”: {
“direct”: 11210,
“distTCP”: 21100,
“distTLS”: 21150
},
“recoveryType”: “none”,
“services”: [
“fts”,
“index”,
“kv”,
“n1ql”
],
“status”: “healthy”,
“systemStats”: {
“allocstall”: 426,
“cpu_cores_available”: 2,
“cpu_stolen_rate”: 0,
“cpu_utilization_rate”: 9.638878912145952,
“mem_free”: 4793425920,
“mem_limit”: 8370180096,
“mem_total”: 8370180096,
“swap_total”: 2147479552,
“swap_used”: 0
},
“uptime”: “4857928”,
“version”: “7.1.1-3175-community”
},
{
“addressFamily”: “inet”,
“addressFamilyOnly”: false,
“clusterCompatibility”: 458753,
“clusterMembership”: “active”,
“configuredHostname”: “cbn03.internal.com:8091”,
“couchApiBase”: “http://cbn03.internal.com:8092/”,
“cpuCount”: 2,
“externalListeners”: [
{
“afamily”: “inet”,
“nodeEncryption”: false
}
],
“hostname”: “cbn03.internal.com:8091”,
“interestingStats”: {
“cmd_get”: 2.7,
“couch_docs_actual_disk_size”: 125398652,
“couch_docs_data_size”: 40095868,
“couch_spatial_data_size”: 0,
“couch_spatial_disk_size”: 0,
“couch_views_actual_disk_size”: 0,
“couch_views_data_size”: 0,
“curr_items”: 21084,
“curr_items_tot”: 63163,
“ep_bg_fetched”: 0,
“get_hits”: 2.4,
“mem_used”: 70503816,
“ops”: 31.6,
“vb_active_num_non_resident”: 0,
“vb_replica_curr_items”: 42079
},
“mcdMemoryAllocated”: 6385,
“mcdMemoryReserved”: 6385,
“memoryFree”: 4972781568,
“memoryTotal”: 8370180096,
“nodeEncryption”: false,
“nodeHash”: 81368020,
“nodeUUID”: “bebb4788fd2d44c02978a7c31328315d”,
“os”: “x86_64-pc-linux-gnu”,
“otpNode”: “ns_1@cbn03.internal.com”,
“ports”: {
“direct”: 11210,
“distTCP”: 21100,
“distTLS”: 21150
},
“recoveryType”: “none”,
“services”: [
“fts”,
“index”,
“kv”,
“n1ql”
],
“status”: “healthy”,
“systemStats”: {
“allocstall”: 0,
“cpu_cores_available”: 2,
“cpu_stolen_rate”: 0,
“cpu_utilization_rate”: 7.631311848515636,
“mem_free”: 4972781568,
“mem_limit”: 8370180096,
“mem_total”: 8370180096,
“swap_total”: 2147479552,
“swap_used”: 0
},
“uptime”: “4766580”,
“version”: “7.1.1-3175-community”
}
]
15:00:15.792 INFO Successfully connected to Key Value service at cbn01.internal.com:11210
15:00:15.795 INFO Successfully connected to Management service at cbn01.internal.com:8091
15:00:15.798 INFO Successfully connected to Views service at cbn01.internal.com:8092
15:00:15.799 INFO Successfully connected to Query service at cbn01.internal.com:8093
15:00:15.801 INFO Successfully connected to Search service at cbn01.internal.com:8094
15:00:15.801 WARN Could not test Analytics service on cbn01.internal.com as it was not in the config
15:00:15.807 INFO Successfully connected to Key Value service at cbn02.internal.com:11210
15:00:15.813 INFO Successfully connected to Management service at cbn02.internal.com:8091
15:00:15.818 INFO Successfully connected to Views service at cbn02.internal.com:8092
15:00:15.822 INFO Successfully connected to Query service at cbn02.internal.com:8093
15:00:15.825 INFO Successfully connected to Search service at cbn02.internal.com:8094
15:00:15.825 WARN Could not test Analytics service on cbn02.internal.com as it was not in the config
15:00:15.830 INFO Successfully connected to Key Value service at cbn03.internal.com:11210
15:00:15.836 INFO Successfully connected to Management service at cbn03.internal.com:8091
15:00:15.840 INFO Successfully connected to Views service at cbn03.internal.com:8092
15:00:15.844 INFO Successfully connected to Query service at cbn03.internal.com:8093
15:00:15.847 INFO Successfully connected to Search service at cbn03.internal.com:8094
15:00:15.847 WARN Could not test Analytics service on cbn03.internal.com as it was not in the config
15:00:15.850 INFO Memd Nop Pinged cbn01.internal.com:11210 10 times, 0 errors, 0ms min, 0ms max, 0ms mean
15:00:15.867 INFO Memd Nop Pinged cbn02.internal.com:11210 10 times, 0 errors, 0ms min, 1ms max, 0ms mean
15:00:15.885 INFO Memd Nop Pinged cbn03.internal.com:11210 10 times, 0 errors, 0ms min, 1ms max, 0ms mean
15:00:15.885 INFO Diagnostics completed

Summary:
[WARN] Your connection string specifies only a single host. You should consider adding additional static nodes from your cluster to this list to improve your applications fault-tolerance
[WARN] Bootstrap host 192.168.8.10 is not using the canonical node hostname of cbn02.internal.com. This is not neccessarily an error, but has been known to result in strange and challenging to diagnose errors when DNS entries are reconfigured.
[WARN] Could not test Analytics service on cbn01.internal.com as it was not in the config
[WARN] Could not test Analytics service on cbn02.internal.com as it was not in the config
[WARN] Could not test Analytics service on cbn03.internal.com as it was not in the config

Found multiple issues, see listing above.

mreiche · December 7, 2023, 2:22pm

SDK Doctor output is normal. You don’t have analytics in your cluster. That’s fine.

jon.bick · January 10, 2024, 11:47am

Hi All and happy new year. I got fed up in the end and rewrote the webservice into the latest .net 8 Web API. This meant that the quick start docs on the couchbase documentation site made a lot more sense. I also changed everything over to async at the time and am pleased to say the error has gone and everything is running much smoother.

Previously the service was wcf and where everything was running synchronous I believe the potentially 100’s calls a second were beginning to queue and cause a timeout. Thank you all for your help on this matter.

Topic		Replies	Views
Operation times out .NET SDK	8	5001	April 4, 2018
The operation /54379 timed out after 00:00:02.5000000. It was retried 1 times using Couchbase.Core.Retry.BestEffortRetryStrategy .NET SDK	5	1578	April 28, 2022
Couchbase.Core.Retry.BestEffortRetryStrategy Timeout Error .NET SDK	14	3289	May 5, 2022
Slow operations and timeouts Couchbase Server	4	1169	May 23, 2022
The operation has timed out .NET SDK	15	2858	November 6, 2020

GAT Operation Timeout KVTimeOut is 2.5s

Related topics