Sdk net 3.0.1 vs Sdk net 2.7 Performance Difference

David_Allaigre · June 3, 2020, 9:26am

Hi,

I did try to convert our really big couchbase application to the new Sdk 3.0.1. Our main concern are performance. Our application is mainly relying on the Get by key feature, and must be able to retrieve a lot of key in the most performance way.

The typical usage, is receiving a bunch of key, retrieving all documents, returning documents.

On our benchmark test:

Couchbase Cluster : 1 node, 8 CPU 4Ghz, 64GB Ram, Ubuntu 18.04
Application Server: 1 node, 16 CPU 4Ghz, 64GB Ram, Ubuntu 18.04
Connection Wired 1Gb/s (No firewall, no proxy)

Operation: 100000 * GetAsync(Key) by partition range (Transform Block, BufferBlock), Paritition by 50
Couchbase Connection Pool: Min = 25, Max = 25

Sdk 2.7: 5-6s for retreiving the 100000 records. (Bucket ops >20K)
Sdk 3.0.1: 380s for retreiving the same 100000 records.(Bucket ops = 300-400)

Our code is more or less:

return await keys.SelectParallelRangeAsync(k=> collection.GetAsync(k), 50).ConfigureAwait(false);

SelectParallelRangeAsync is an helper:

public static async Task<T2> SelectParallelRangeAsync<T, T2>(this IEnumerable sequence, Func<T, Task> action, int batchSize)
{
var batcher = new TransformBlock<T, T2>(doc => action(doc), new ExecutionDataflowBlockOptions() { MaxDegreeOfParallelism = batchSize });
var buffer = new BufferBlock();
batcher.LinkTo(buffer);
sequence.ForEach(s=>batcher.Post(s));
batcher.Complete();
await batcher.Completion.ConfigureAwait(false);
if (buffer.TryReceiveAll(out var results))
{
return results.ToArray();
}
return null;
}

For now we stick to 2.7 for performance reason. Please advise us.

Best regards,

David.

btburnett3 · June 3, 2020, 12:26pm

In theory, SDK 3 is supposed to be faster. It’s had a lot of internal rewrites to use more optimal structures like Span and to reduce heap allocations. That said, it’s certainly possible that we’ve introduced new problems and bottlenecks. I’m definitely not seeing performance that low in my testing, so I’m wondering if it’s specifically related to your usage pattern interacting in an unexpected way.

Is there any chance you can provide something like a dotTrace output that may show where the bottlenecks are? If not, that’s fine, I can try to replicate it myself. But reducing variables in the reproduction is always best, if possible.

David_Allaigre · June 3, 2020, 3:57pm

Thanks for the quick answer.
After further investigation, it appear this behavior can be replicated… and it’s very limited to the “DEBUG” mode in visual studio. If i launch the benchmark with CTRL+F5 performance are good, if i launch in debug mode (F5) very slow…

We did not have this behavior with 2.7… I’m investigating now on visual studio side.

My full program:

using Couchbase;
using Couchbase.KeyValue;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;

namespace TestCbPerf
{
class Program
{
static async Task Main(string args)
{
var cluster = await Cluster.ConnectAsync(“couchbase://cb”, new ClusterOptions() { UserName = “Administrator”, Password = “password”, MaxKvConnections = 25, NumKvConnections = 25 });
var bucket = await cluster.BucketAsync(“DEM”);
var collection = bucket.DefaultCollection();
var keys = new int[100000];
using (_ = new Benchmark(“Test”))
{
var results = await SelectParallelRangeAsync(keys, (k) => GetOneAsync(collection, “key”), 50);
var r = results.AsParallel().Select(r => r?.ContentAs()).ToArray();
Console.WriteLine(r.Count());
}
}

    static async Task<IGetResult> GetOneAsync(ICouchbaseCollection collection, string k)
    {
        try
        {
            return await collection.GetAsync(k);                                
        }
        catch (Exception ex)
        {
            return null;
        }
    }

    static async Task<T2[]> SelectParallelRangeAsync<T, T2>(IEnumerable<T> sequence, Func<T, Task<T2>> action, int batchSize)
    {
        var batcher = new TransformBlock<T, T2>(doc => action(doc), new ExecutionDataflowBlockOptions() { MaxDegreeOfParallelism = batchSize });
        var buffer = new BufferBlock<T2>();
        batcher.LinkTo(buffer);
        sequence.AsParallel().ForAll(s => batcher.Post(s));
        batcher.Complete();
        await batcher.Completion.ConfigureAwait(false);
        if (buffer.TryReceiveAll(out var results))
        {
            return results.ToArray();
        }
        return null;
    }
}

public class Benchmark : IDisposable
{
    private readonly string name = null;
    private readonly Stopwatch watch = null;
    private readonly Benchmark last = null;

    private static AsyncLocal<Benchmark> CurrentAsync = new AsyncLocal<Benchmark>();

    public Benchmark(string name, bool onlyEnd = false)
    {
        this.name = name;
        this.last = CurrentAsync.Value;
        CurrentAsync.Value = this;
        watch = new Stopwatch();
        if (!onlyEnd)
        {
            System.Diagnostics.Debug.WriteLine($"{name} Starting");
        }
        watch.Start();
    }

    public static void StepCurrent(string stepName)
    {
        CurrentAsync.Value?.Step(stepName);
    }

    public void Step(string stepName)
    {
        var elapsedTime = $"{watch.Elapsed.TotalSeconds:000.00} s";
        var label = $"{name} {stepName}";
        System.Console.WriteLine($"{label,-30} {elapsedTime,10}");
        System.Diagnostics.Debug.WriteLine($"{label,-30} {elapsedTime,10}");
    }

    public void Dispose()
    {
        watch.Stop();
        Step("Ended");
        CurrentAsync.Value = last;
    }
}

}

David

btburnett3 · June 4, 2020, 3:50pm

Very interesting.

One thing that jumps out at me is the lack of ConfigureAwait(false) in GetOneAsync. But it doesn’t seem like a likely culprit, since this program appears to run in the default no-op SynchronizationContext.

jmorris · June 4, 2020, 6:27pm

Hi @David_Allaigre -

I created a task to evaluate the performance difference while in DEBUG mode: NCBC-2536.

As an aside, I was wondering why use parallelization for IO bound code? Generally, we suggest using Task.WhenAll(tasks) to batch requests.

-Jeff

David_Allaigre · June 5, 2020, 12:21pm

Thanks,
We are using the ranging, because the real application is a farm of web servers.

If someone is requesting a large amount of data, this will generate a lot of task, and prevent other user to be able to use the system by starving the resources during that time.

Also we have very strange behavior with large numbers:
setup: Num / MaxKvConnection 2

for large number like 1000 keys:
By ranging by 25 => average : 0.10s
Task.WhenAll => average: 0.10s

for large number like 10000 keys:
By ranging by 25 => average : 1.0s
Task.WhenAll => average: 1.0s (But system hang some time)

for very large number like 100000 keys:
By ranging by 25 => average : 10s
Task.WhenAll => average: System hang all the time

Also, with the pipeline system in your SDK, if I create a lot of “Get” for one httpRequest, this will pool one after the other one. If another request is coming, even if i queue the “Get”, i will have to wait. Even if it s in parallel, the first one calling large amount of data will make all other one more or less waiting…
So using the range is allowing us to distribute the access to the connections pool between request, and so try to have as much as possible a consistent response time. (Sort of time/ressources sharing multi threading)

For web application the main question is always: do you want to respond as quickly as possible, or do you want to have a consistency and being able to answer a lot. With our SelectParallelRangeAsync and a lot of other optimisation we are able to do both !

Most of the time number of requested keys are below the batchSize, in that case it’s more or less equivalent to Task.WhenAll… To my team, i always warn everyone about Task.WhenAll when you do not have any ideas of the number of tasks you will generate. The batchSize is putting a sort of cap.

The best result we have so far are: connection pool to couchbase: 2 to 5, depending of the network speed, batchSize of request to cb : 2 to 3 * poolSize. With that, we are avoiding most of the timeout and we are having a consistent respond time even in full load.

If you are interested, i will be glad to introduce you to our product (A sort of all in one digital business application server, including CRM, eCommerce, PIM, Task, Full Text Search) fully based on couchbase.

Regards,

David.

jmorris · June 5, 2020, 4:43pm

@David_Allaigre -

Thanks for the details and feedback, it helps us to understand the issue and improve the SDK! Yes, even with WhenAll you will still have to batch.

That would be awesome!

-Jeff

ingenthr · June 6, 2020, 12:07am

I’d be interested too, as I’m sure a couple colleagues would be.

David_Allaigre · June 8, 2020, 10:20am

Hi,

Could you please contact me on my email: david.allaigre@bdverse.com, so we will schedule a presentation.

Regards,

David.

David_Allaigre · June 10, 2020, 6:35am

Hi, I would like to check if you are still interested to have a presentation of our product. If this is still the case, send me an email i can contact you on directly.

Regards,
David.
david.allaigre@bdverse.com

ingenthr · June 10, 2020, 6:41am

I’ll send you an email right now. Thanks!

Topic		Replies	Views
.NET SDK Performance .NET SDK	12	2871	November 3, 2020
Couchbase .NET SDK 3.x performance vs 2.7.26 .NET SDK dot-net	10	1634	March 31, 2022
Upgrading from sdk 2.7.x to 3.0.x or 3.1.x degrades performance significantly .NET SDK	4	1070	July 14, 2021
Net SDK 3 performance .NET SDK	32	2448	February 6, 2021
How does couchbase key lookup api work .NET SDK	2	828	April 20, 2022

Sdk net 3.0.1 vs Sdk net 2.7 Performance Difference

Related topics