Hello,
we’re using .net client version 2.5.12 with server 4.1.x. The client seemingly at random since we cant find a pattern will start writing to the default bucket (we have it there for backwards compatibility, it should be empty) and even when it does that it reports a timeout to the caller. This happens on one or two servers out of a large set of servers and it doesnt resolve until the application hosting the client is restarted.
Our cluster and bucket are singletons, created once for the entire lifetime of the app and the configuration looks like this (we inherit our config from ClientConfiguration)
this.Bucket = bucketName;
this.Servers.Clear();
foreach (var url in knownUrls)
this.Servers.Add(new Uri(url));
this.BucketConfigs.Clear();
this.BucketConfigs[bucketName] = new BucketConfiguration
{
PoolConfiguration = this.PoolConfiguration,
Password = bucketPassword,
BucketName = bucketName
};
and then we do a simple
var cluster = new Cluster(config);
var client = cluster.OpenBucket(config.Bucket);
Unfortunately we dont store so many logs from the trace but we do log the errors as returned from the upset/get operations which show StatusCode=OperationTimeout and Message=The operation has timed out and the amount it takes to return is the default operation timeout (2500ms) which is different than the send timeout of 10000ms.
As noted it only happens rarely but it doesnt heal by itself which is the problematic part. Furthermore it seems to be writing to a single server out of the whole cluster so whatever confusion the client is in, it seems to be limited to a particular set of connections (we’re using multiplexing IO).
Any similar problem seems to be from older clients, does anyone have any clue what might be happening? Our biggest suspicion is a race condition trying to fetch new configs from the server making some clients point to the wrong place but our attempts to reproduce it have been unsuccessful.
The SDK returns Timeout, but it still writes the value to the default bucket?
The DefaultOperationTimeout and SendTimeout are different; the first is the maximum time for an operation to attempt a successful fetch or mutate if you are sending a CAS, the second is the maximum time the client will wait for something from the server - if SendTimeout is exceeeded it generally means the connection was aborted by the server or something in-between. i.e. one timeout is singular the other aggregate - op level vs connection level.
It sounds like a bug to me, have you tried upgrading to a later client version? 2.5.12 is fairly old and we release monthly, so quite a few versions ago!
-Jeff
Hi, thanks for taking the time to check it out.
I confirm that the error coming back to us from the sdk is in fact OperationTimeout and the key appears stored on the default bucket while that error is being logged by our tools. That is why I mentioned SendTimeout, to discard some possibilities; the error is coming back as OperationTimeout and we time how much it took and its close to the DefaultOperationTimeout… I guess just extra info thrown at you, that’s all. This behavior might be related to bugs like Loading... where the error code is misreported.
The version upgrade is not possible for me at this point since I have to target 4.5.1 for a while so 2.5.x is the end of the road for me.
I’m going to browse all bugs closed since 2.5.12 to see if something related to the issue has been addressed, I might be able to recompile with the fix if that is the case.
@jmorris do you think its likely multiplexing is causing this or that the only way this could happen is due to the config provider is confusing some mapping (from a mid size cluster, more than 10 nodes)?
You should be able to use v2.6.x and on with 4.5.X - you just can’t use RBAC and any new features that the server supports - K/V and Query should work IIRC.
I doubt it related to mux - its more likely to config handling I would think, but really not sure. To be clear, your configuration does not have any definition for the “default” bucket?
I meant .net 4.5.1, I think you are talking about cb server 4.5.x .
No intentional or explicit references to the default bucket. That is the intention of emptying BucketConfigs when creating the client configuration object.
I see, my mistake, its the NET dependency you are referring to!
I took a look at the protocol and it seems that requests do not carry information about the bucket they are targeting bur rather that is done when creating/authenticating the socket to each node, is that correct? if so the only way to have this happen is if that step uses the default bucket name and credentials (by some mistake in the app or in the client, hence your question) or if the server has a bug. Can you confirm these statements are accurate?
Yes, buckets are associated with sockets during the negotiation phase just after authentication.
Yes, indeed .