Large number of file descriptors results in failure

Hi,

(tested with 2.0.14 and 2.4.5)

When I open up more than 1024 sockets, couchbase says it times out. (but it is out of FD space as far as I see from the code and old bugs)

I have ulimit -n 1000000, and the operations on the opened sockets work without issue (in my bigger app).

The code that demonstrates the failure is attached, just set the number of 1023 to 1024 and it fails. (I cannot upload any file other than images, so I pasted, very probably horribly mangled.)

#include <stdio.h>
#include <unistd.h>
#include <libcouchbase/couchbase.h>
#include <memory.h>

static void bootstrap_callback(lcb_t instance, lcb_error_t err){
    if (err == LCB_SUCCESS) {
        printf(
                "INFO; %s: Instance is ready.\n",
                __func__);
    } else {
        printf("ERROR; %s: Error %s", __func__,
                lcb_strerror(instance, err));
    }
}


int cbi_init(lcb_t *instance) {
    struct lcb_create_st create_options;
    lcb_error_t err;

    memset(&create_options, 0, sizeof(create_options));
    create_options.v.v0.host = "localhost.localdomain:8091";
    create_options.v.v0.user = "";
    create_options.v.v0.passwd = "";
    create_options.v.v0.bucket = "default";

    err = lcb_create(instance, &create_options);
    if (err != LCB_SUCCESS) {
        printf("ERROR; %s: Failed to create libcouchbase instance: %s\n",
                __func__, lcb_strerror(NULL, err));
        return 0;
    }

    /* Set up the handlers */
    (void) lcb_set_bootstrap_callback(*instance, bootstrap_callback);

    /*
     * Initiate the connect sequence in libcouchbase
     */
    if ((err = lcb_connect(*instance)) != LCB_SUCCESS) {
        printf(
                "ERROR; %s: Failed to initiate connect: %s\n",
                __func__, lcb_strerror(NULL, err));
        return 0;
    } else {
        printf(
                "INFO; %s: Connect request started \n",
                __func__);
    }

    /* Run the event loop and wait until we've connected */
    // finish this off completely.
    // I do not run the normal yield function here as it will call the event loop anyway and I want to completely finish this.
    lcb_wait(*instance);
    return 1;
}


#define NR_FDS 1015
const char *testfilename = "/tmp/bla.txt";

FILE* fd[NR_FDS];

int main(void)
{
    lcb_t instance;

    unlink(testfilename);
    FILE *f = fopen(testfilename,"w");
    if (!f){
        printf("ERROR; cannot create test file\n");
        return(1);
    }
    fwrite("test",1,4,f);
    fclose(f);

    printf("Opening FD\n");
    int i;
    for (i = 0; i < NR_FDS; i++) {
        fd[i] = fopen("/tmp/bla.txt","r");
        if (!fd[i]) {
            printf("ERROR; cannot open more than %d FDs\n",i);
            return(1);
        }
    }
    printf("Inspecting reading from FDs\n");
    char str[100];
    for (i = 0; i < NR_FDS; i++) {
        if (fread(str,1,4,fd[i]) != 4) {
            printf("ERROR; cannot read from more than %d FDs\n",i);
            return(1);
        }
    }

    printf("Opening couchbase connection\n");

    cbi_init(&instance);

    printf("closing\n");

    lcb_destroy(instance);
    return 0;
}

What is your OS limit here (and what is the OS)? Also ensure to subtract the following file descriptors:

  1. stdin
  2. stdout
  3. stderr
  4. if using libevent, that might want an extra fd too
  5. the resolver might also open its own fd

btw, you can use triple backticks in the beginning and end of a code block to have it format properly

I enhanced the sample to show that I can read from the FDs, but that couchbase seems to have issues with it.

The system limit is largely over 1024.

I am not using the libevent version, although I have libevent and libev installed on the system.

The sample above breaks at between 1010 and 1020 non-couchbase FDs.

This is on a fully updated debian 64 bit.

When digging through the web, I find other people encountering this limit. My app unfortunately does massive I/O, and I need more than 1024 FDs, in fact, I need about 100 times more than that, so this limit which seems to be local to the couchbase library is really hurting me.

Is the library compiled against the legacy header files (which mention a limit of 1024)?
I will try to compile myself and see if that changes anything, but I would suggest people from couchbase looking as well.

I found that if one opens the couchbase connection before the massive FD opening, and then does operations on the connection, one gets a really nice segfault.

Hans

Hrm, this might be more an issue with select rather than the library – in this case you should use the libevent or libev plugin (install libcouchbase2-libevent or libcouchbase2-libev). Unfortunately the default select implementation has no way of dealing with fds which are numbered higher than FD_SETSIZE, which to my knowledge is not adjustable. The fd_set structure is a simple array and thus must be properly bounded.

It’s technically possible to hack the fd_set and FD_SETSIZE to be larger, but in such an event, I’d advocate using the libevent plugin which properly uses epoll on Linux and therefore does nto suffer from this issue.

Moving to libev indeed solves this.
Case closed for me.
However, the error code given is misleading. That has been signalled before.

Unfortunately the issues here are caused by buffer overruns, there isn’t a specific error code we can convey, and we really have no place to convey it. I guess we can always abort()/assert()

I’ve filed an issue on this: https://issues.couchbase.com/browse/CCBC-567

Hi I have the same problem recently. After installing the libev or libevent, what is the next step? Do you use libev in sync mode or async mode? Could you share a simple example? Thanks.

If you are using libev, you have to explicitly select it when doing lcb_create

struct lcb_create_io_ops_st cio = {0};
lcb_io_opt_t io = NULL;
cio.v.v0.type = LCB_IO_OPS_LIBEV;
err = lcb_create_io_ops(&io, &cio);

lcb_create_st cropts = {0};
cropts.version = 3;
cropts.v.v3.connstr = "couchbase://host1,host2,host3";
cropts.v.v3.io = io;
err = lcb_create(&instance, &cropts);

or, if you using NULL (default for IO), you can export LCB_IOPS_NAME=libev environment variable.

Do you know how to use libev in sync mode? My application is developed in sync mode now.

I think you should try to pass event loop from your application into libcouchbase. This example shows how it could be done with libevent:

With libev it will be similar. Once you passed it, you can control when to run libev event loop.

If it is not what you are looking for, could you show minimal working example, so that I can compile and execute it, and give review notes.

If I use libevent to work in async mode, how could we enforce the first-come-first-serve rule for requests?

at the moment libcouchbase guarantee that commands scheduled to the same node will be ordered, so that first-come-first-serve. but it is not possible to guarantee that if your cluster have more than one nodes, as they can process request with different speed.

Could you be more specific and show the code snippet which demonstrates your issue/question?

@avsej Sorry, I can’t share the code snippet here since it is company code. My code is used in a plugin installed in Apache Traffic Server (ATS), which is also an event-driven framework. When ATS gets a request, it will trigger my plugin to process it. My plugin created 20 threads with one lcb_t instance in each thread working in sync mode. Couchbase lookup is a part of the request processing. Once my plugin gets a request, it will assign the request to one of the thread to process the couchbase lookup, then it returns the system control back to ATS. After the response of lookup returns, the thread will trigger the ATS to wake my plugin to continue to process the request. Previously I tried to use libevent in my solution, but I didn’t know how to integrate the libevent with ATS. I tried to create a dedicated thread running the event_base_loop(), but I found that the average latency and maximum latency are worse than current solution with sync mode. But now the file descriptors opened in ATS box is higher than 40k (contributed by other ATS plugin), sync mode with default event_loop doesn’t support > 1024 case. This is my problem.

Sorry for being a spoilsport, and I hate myself for telling you “you are asking the wrong questions”, like one sees so often from self-important programmers, but I strongly suggest that you ask some deeper questions first before going further. Also, not knowing all of the details, I may tell you things that you already know. In that case, just ignore me please.

I understand you wanted to stay synchronous because your code base was like that. I respect that. But now it shows that you are using that sync behaviour for things that are in essence incompatible with Couchbase. Couchbase (like many other NoSQL systems) is a distributed system, and by essence cannot guarantee ordered handling, as avsej also mentioned.

Please ask yourself, why do you think you need first-come-first-serve?
Do you need it to support transactions per-document? If so, please use CAS, not timing.
Do you need it to support transactions between documents? That is a LOT more difficult, and you will not only need CAS, but also to dive into things like two-phase commits etc.

In short: if you need temporal consistency, it will be up to you to enforce it, using a context that you set, you cannot rely on a distributed system to do so, and to do so globally (well, you could get close by using algorithms like Paxos, but then things get even more wild).

Also, please try to understand the CAP theorem. Although somewhat flawed, it will give you a bit more insight in some of the pitfalls of distributed systems.

To go back to the async/sync thing:

  1. no longer requiring first come first serve, and using the async API, will allow you to achieve MUCH higher throughputs. 10 to 100 times more is feasible. But only in scale. If you do a purely sequential test case, async will be slower than sync. If you do a concurrent/parallel test, then async, if well written, will be a lot faster. Maybe that is what you saw.
  2. In the dedicated thread test that you did (which is a proven standard design pattern, so it was in the right direction), please look at your thread communication and synchronisation methods. Queues exist in many sorts, and no matter how you did it, be it lockfree or locks or semaphores or whatever other method, when used the wrong way, will impact performance and stability a lot. Doing that right will take some work, but is a great opportunity to learn, and will allow you to make a more performing and potentially more robust solution.

Unfortunately since I do not know Apache TS, I do not know if you are be able to leverage the event system in Apache TS for that. Personally, having used another web server, but having encountered the same issues, I steered away from messing with the web server’s event system, too many things can go wrong.

Hi @bateau020, thanks for your detailed explanation. I think I should not use the term FIFO here. Anyway, I am trying to use async mode now. Since messing the libevent’s event loop with ATS internal event loop is complicative and unknown, I plan to use couchbase with libevent in two threads way. Create an event base and share it with two threads, use lcb_get in one thread, run event_base_loop in other thread. Per the libevent’s doc (Re: [Libevent-users] event_add from a different thread while event loop is running), it seems an event base can be shared with multiple threads (but only one thread runs the event_base_loop) if evthread_use_pthreads api is called. But base on @avsej’s comment “if you are going to run event loop (with the my_get_callback()) in one thread, but schedule the GET operations from the another thread (calling lcb_get(…)), you will get issues” on (C API - Are callbacks done synchronously or asynchronously? - #2 by avsej) and the comment of function lcb_create_libevent_io_opts “@param base the event base (struct event_base *) to hook use (please note that you shouldn’t reference the event base from multiple threads)” (libcouchbase/plugins/io/libevent/libevent_io_opts.h at master · couchbase/libcouchbase · GitHub), it seems that my two threads way can’t work correctly. Do you @bateau020 @avsej have suggestions here? Thanks.

You should not share event base or lcb_t between threads, as it is not thread-safe operation. What you should do is to create some communication channel between the thread where libcouchbase connection works (along with libevent loop) and your application, so that you will not modify internal structures of libcouchbase from the different thread.

The second option is to create IO plugin for your ATS and pass it to libcouchbase, so that libcouchbase will use your routines to access network. Examples are here:

As usual, avsej beat me to a reply. Anyway, here’s how I solved it (one of the many possible solutions):

Main elements:

  • one worker thread running the event loop and all the (async) SDK functions
  • many “interface” threads that expose synchronous functions
  • a job queue as communication means in between them.

Details:

  • Only 1 thread runs the event loop and all the SDK functions. Let’s call this the worker thread. It runs the SDK in an async way.
  • Other (by preference many) threads, push jobs onto a MPSC (Multi Producer, Single Consumer) queue. There are some very nice implementations of that out there. If you go bounded, then they can become very fast.
  • Each job posted onto the queue holds a pointer to a return buffer.
  • Jobs get handled by the worker thread, and replies are posted back to the mentioned per-job return buffer.
  • The original job poster hangs until it sees a reply in that return buffer. That will make it work synchronously.

Now for the finer details:

  • Choose your thread sync mechanisms wisely. Especially in a multiprocessor situation, where syncing between processors can be touchy. Look at memory barriers and so on.
  • Make sure that you do not block indefinitely waiting for a reply. Integrate timeouts.
  • But that introduces another problem: make sure you do not free memory that can still be written to. This means, in the above layout, that that ‘return buffer’ should only be freed by the last one accessing it. Who the last is depends on if the timeout has occured.
  • Those synchronous functions might also be a good place to introduce some lower level error handling: identify fatal issues (like out-of-memory), and retry on temporary issues (with logarithmic back-off)
  • Those synchronous functions might also be a good place to introduce higher level workflows, like append-or-create, or get-or-get_replicate, or read-and-modify-with-CAS.
1 Like

@bateau020 In your solution, the worker thread needs to fetch jobs from MPSC queue and run the event loop. Does your worker thread work in the following way? If not, could you provide some idea about how the worker thread integrates the job fetching with event loop?
while(true) {
fetch a job from queue;
lcb_get(job);
event_base_loop(evbase, EVLOOP_NONBLOCK);
}

@myao Well, you hit a weak point. That is indeed what I did, and if you do that, you will need some way to prevent CPU hogging when there is no work. It did the very bad thing of just adding a variable very short sleep (via nanosleep or usleep), depending on load. Nothing breaks, it is just not able to reach the utmost peak speed. But still not good, I know…

MUCH better would be to integrate that MPSC queue in the event loop mechanism. In my use case, the effort needed to rewrite the queue was not worth the gain. I support more than 10 times the ops/sec that I need to support, so I let that slide.