Couchbase analytycs performance and query queue issue

Hello everyone,

We are testing the Couchbase analytics service and ran into an issue. When making several requests within a short period of time (1 second), the requests go into a queue, causing the execution time to increase to unacceptable levels. We tried increasing the RAM and CPU limits, and also used clustering (3 servers), but this did not help.

Our test query involves a collection with 130,000 documents, joined with a collection of 2,000 documents, then joined with another collection of 200 documents, and includes sorting. We noticed that the query execution time significantly increases when using JOIN, regardless of the size of the joined collection.

Could someone explain how the Couchbase analytics service handles request queues? How is the queue formed, and what can be done to improve the performance of the service? Also, are there any possible mistakes in configuring RAM, CPU limits, or clustering?

Thank you in advance for your help!

When making several requests

Please show how you are making these requests.

Our test query

Please show the query.

Hi @mreiche !

for (let i = 0; i < cyclesCount; i++) {
  await Promise.all(
    clusters.map(async (cluster, index) => {
      try {
        const { alias } =
          aliases[Math.floor(Math.random() * 10)] || aliases[0];

        if (!cluster) {
          console.error("Couchbase connection is missing");
          return;
        }

        await timeout(Math.random() * 500);

        const result = await cluster.analyticsQuery(
          `
          SELECT VALUE {'en': doc.data.en, 'lang': IFMISSING(doc.data.lang, doc.data.en), 'doc_id': doc.id, 'orderBy': ARRAY_LENGTH(OBJECT_VALUES(doc._orderBy)) > 0}
          FROM main.service.web doc
              JOIN main.service.providers_web doc_pr ON doc_pr.data.en.alias = doc.data.en.provider.alias
              AND doc_pr.data.en.provider_view = TRUE
              AND doc_pr._scope = 'general_scope'
              AND (ARRAY_LENGTH(doc_pr.data.en.allowed) = 0
                  OR ARRAY_CONTAINS(doc_pr.data.en.allowed, 'XX'))
              AND (ARRAY_LENGTH(doc_pr.data.en.restricted) = 0
                  OR ARRAY_CONTAINS(doc_pr.data.en.restricted, 'XX') = FALSE)
              AND (IFMISSING(ARRAY_LENGTH(doc_pr.data.en.currencies), 0) = 0
                  OR ARRAY_CONTAINS(doc_pr.data.en.currencies, 'XXX') = FALSE)
              JOIN main.service.categories_web doc_cat ON doc_cat.data.en.alias IN [ 'category' ]
              AND doc_cat.data.en.category_view = TRUE
              AND doc_cat._scope = 'general_scope'
              AND (ARRAY_LENGTH(doc_cat.data.en.allowed) = 0
                  OR ARRAY_CONTAINS(doc_cat.data.en.allowed, 'XX'))
              AND (ARRAY_LENGTH(doc_cat.data.en.restricted) = 0
                  OR ARRAY_CONTAINS(doc_cat.data.en.restricted, 'XX') = FALSE)
          WHERE (ARRAY_LENGTH(ARRAY_INTERSECT(ARRAY_STAR(doc.data.en.categories).alias, [ '${alias}' ])) > 0
                  AND ARRAY_CONTAINS(doc.data.en.platforms, 'web') = TRUE
                  AND doc._scope = 'general_scope')
          ORDER BY orderBy DESC,
                  IFMISSING(col_doc._orderBy.featured_priority_new, IFMISSING(col_doc._orderBy.priority_new, IFMISSING(col_doc._orderBy.featured_priority, IFMISSING(col_doc._orderBy.priority, IFMISSING(col_doc._orderBy.recently_updated, IFMISSING(col_doc._orderBy.new_entry, IFMISSING(col_doc._orderBy.default_priority, col_doc._orderBy.base_priority))))))) ASC,
                  col_doc.updated_at DESC
          LIMIT 24
          `,
          {
            scanConsistency: AnalyticsScanConsistency.NotBounded,
          }
        );

      } catch (error) 

      ...

Use one ‘cluster’ for all the queries. The SDK will round-robin the requests to the analytics nodes and use multiple connections to nodes as needed.
The more concurrent queries, the longer the response time. To decrease the response time, investigate optimizing the query and increasing the number of analytics nodes. The “Explain” button on the Analytics Workbench will show the analytics query plan which may be useful for optimizing.

Since [ $alias ] is always a single element, it seems like CONTAINS could be used instead of ARRAY_INTERSECT (?)

(ARRAY_LENGTH(ARRAY_INTERSECT(ARRAY_STAR(doc.data.en.categories).alias, [ '${alias}' ]))

@mreiche thanks for answer!

Requests using multiple connections are more experimental, aimed at understanding the flow of handling Couchbase queries for analytics. Using multiple connections has almost no effect on query execution time. Optimizing queries: in real queries, a data array will be used, and a single value is only used for testing.

During testing and experiments, it was noticed that Couchbase processes 2-3 specified queries in parallel. A cluster of 3 nodes was created, rebalanced, and RAM/CPU limits and quotas were increased, but the changes only made things worse.

The topic of this discussion is to understand how the analytics service processes queries and to identify errors when scaling (changing RAM/CPU quotas, adding nodes).

There’s an assumption that Couchbase evaluates the resources required for processing incoming queries before execution. It calculates how many queries can be executed in parallel based on the RAM/CPU quotas, and the remaining queries are added to a queue. The mechanism for calculating the number of workers, the required RAM for executing a query, and the distribution of load between cluster nodes is unclear. We need to figure this out to identify the reasons for the lack of improvement from scaling.

Hi @Maxim

This blog post will be helpful here. The amount of concurrent queries the Analytics service will execute is going to be determined by a combination of your cluster resources and the complexity of the queries.

Scaling up RAM, nodes and core counts should increase the concurrency possible, and I am surprised you found that not to be the case.

There are tunables that will increase the number of query workers (coresMultiplier) and ones that will reduce the amount of RAM allocated to a query, but per the blog post, this can backfire as it’s forcing the cluster to do more work than it may be capable of. Something to experiment with in your testing I’d say.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.