I am triyng to insert multiple data into CB without success. I assume that this shouldn’t be a problem for a such DB engine to insert 100K document (by using suggestion in Bulk Operations | Couchbase Docs). But I am geting timeouts in my local environment with 12 core and SSD. After some time CB gives time-outs. The .Net SDK’s retry logic runs but it takes an other time-out and at the end .net exception throws.
What is the reason for those timeouts? Why the server can not answer this concurreny? Or is it the client SDK causes this problem (it is CouchbaseNetClient 3.2.9)
This is not a big workload, so increasing timeout shouldn’t be a solution.
For example when i try SQL server with such workload i don’t get time out. So how i can be sure in prod environment that we will not face time outs?
bucket:
Durability Level is : Major and persist to active
no Replica
document:
{
"type": "productX",
"productSourceId": 642930,
"productCode": "XXXXX",
"productName": "W/CXX/8/XXXXX/XX",
"class": "Cosmetics",
"assetGroupId": 110291,
"barcode": "88888888888888",
"sizeId": 248,
"size": "STD",
"colorId": 0,
"color": "BR77",
"colorFamily": "yyyyyyyy",
"colorName": "FFFFF",
"imageUrl": null,
"webName": "",
"createdDate": "2020-12-09 20:12:33.8272886 +03:00",
"subDivisionId": 250,
"subDivision": "wwwwww",
"seasonId": 4244,
"season": "SS",
"productId": 642930
}
.net code:
async Task InsertParallelAsync()
{
Console.WriteLine("---InsertParallelAsync");
var cluster = await Cluster.ConnectAsync($"couchbase://localhost", "Administrator", "111111");
var collection = (await cluster.BucketAsync("Test")).DefaultCollection();
var itemList = System.Text.Json.JsonSerializer.Deserialize<List<ProductItem>>(System.IO.File.ReadAllText("product_100k.json"));
var taskList = new List<Task<IMutationResult>>();
var sw = new System.Diagnostics.Stopwatch();
sw.Start();
foreach(var item in itemList)
{
var task = collection.InsertAsync(Guid.NewGuid().ToString(), item);
taskList.Add(task);
}
await Task.WhenAll(taskList);
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
}
exception:
An exception of type ‘Couchbase.Core.Exceptions.AmbiguousTimeoutException’ occurred in System.Private.CoreLib.dll but was not handled in user code: ‘The operation /556 timed out after 00:00:05. It was retried 1 times using Couchbase.Core.Retry.BestEffortRetryStrategy.’
Additionaly,
I tried Couchbase.Extensions.MultiOp which is the extension library to overcome this isue. Bu the performance is not acceptable. Because inserting documents in a sequentian way without using any parallelisim gives the same (aproximatly) performance.
Inserting 100K document using Couchbase.Extensions.MultiOp;
async Task InsertParallelWithOptimizationAsync()
{
Console.WriteLine("---InsertParallelWithOptimizationAsync");
var cluster = await Cluster.ConnectAsync($"couchbase://localhost", "Administrator", "111111");
var collection = (await cluster.BucketAsync("Test")).DefaultCollection();
var itemList = System.Text.Json.JsonSerializer.Deserialize<List<ProductItem>>(System.IO.File.ReadAllText("product_100k.json"))
.Select(e =>
{
e.id = Guid.NewGuid().ToString();
return e;
}).ToDictionary(e => e.id);
var sw = new System.Diagnostics.Stopwatch();
sw.Start();
var result = await collection.Insert(itemList).ToList();
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
}
Takes 1091672 ms ~18 minutes
Inserting 100K documents without any parallesim
async Task InsertSequentialAsync()
{
Console.WriteLine("---InsertSequentialAsync");
var cluster = await Cluster.ConnectAsync($"couchbase://localhost", "Administrator", "111111");
var collection = (await cluster.BucketAsync("Test")).DefaultCollection();
var itemList = System.Text.Json.JsonSerializer.Deserialize<List<ProductItem>>(System.IO.File.ReadAllText("product_100k.json"));
var sw = new System.Diagnostics.Stopwatch();
sw.Start();
foreach(var item in itemList)
{
await collection.InsertAsync(Guid.NewGuid().ToString(), item);
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
}
Takes 1209179 ms ~20 minutes
So, i couldn’t find a performant way to insert 100K document into CB from my single machine client. There should be a way to benefit from concurrency since this is a server system.