Perfomance issue with N1QL self join

singh_siddhu · February 3, 2016, 1:56pm

Hi,
I am suffering with the performance of N1QL query where I have a self join on a key which is indexed.
There are two kinds of documents (A & B say ) with in same bucket. Both has the common key to be used to join them. B is having some columns which are not in A and I am using aggregate functions ( sum for e.g ) on B’s document.
I have 2 million documents in 8GB each node ( 3 * 8 GB per cluster ). So there is enough space.
I am getting output in sever minutes like 22 minutes.
I have also index created on the join key.
What could be the issue.
Cheers,
Siddhartha

geraldss · February 3, 2016, 4:14pm

Please post the query. Thanks.

singh_siddhu · February 4, 2016, 7:19am

Query

select
IFNULL(N.`No_`,"") AS `No_`,
IFNULL(N.`Name`,"") AS `Name`,
IFNULL(N.`Income_Balance`,0) AS Income_Balance,
IFNULL(N.`Account Type`,0) AS AccountType,
IFNULL(N.`Totaling` ,"") AS `Totaling`,
IFNULL(N.`Gen_ Bus_ Posting Group` ,"") AS `Gen_ Bus_ Posting Group`,
IFNULL(N.`Gen_ Prod_ Posting Group` ,"") AS `Gen_ Prod_ Posting Group`,
IFNULL(N.`Consol_ Debit Acc_` ,"") AS `Consol_ Debit Acc_`,
IFNULL(N.`Consol_ Credit Acc_` ,"") AS `Consol_ Credit Acc_`,
IFNULL(N.`Consol_ Translation Method`,0) AS `Consol_ Translation Method`,
IFNULL(N.`Cost Type No_` ,"") AS `Cost Type No_`,
IFNULL(N.`Default IC Partner G_L Acc_ No` ,"") AS `Default IC Partner G_L Acc_ No`,
IFNULL(N.`Direct Posting`,0) AS `Direct Posting`,
IFNULL(N.`VAT Bus_ Posting Group` ,"") AS `VAT Bus_ Posting Group`,
IFNULL(N.`VAT Prod_ Posting Group` ,"") AS `VAT Prod_ Posting Group`,
IFNULL(N.`Search Name` ,"") AS `Search Name`,
IFNULL(N.`Global Dimension 1 Code` ,"") AS `Global Dimension 1 Code`,
IFNULL(N.`Global Dimension 2 Code` ,"") AS `Global Dimension 2 Code`,
IFNULL(N.`Debit_Credit`,0) AS `Debit_Credit`,
IFNULL(N.`No_ 2` ,"") AS `No_ 2`,
IFNULL(N.`Blocked`,0) AS `Blocked`,
IFNULL(N.`Reconciliation Account`,0) AS `Reconciliation Account`,
IFNULL(N.`New Page`,0) AS `New Page`,
IFNULL(N.`No_ of Blank Lines`,0) AS `No_ of Blank Lines`,
IFNULL(N.`Indentation`,0) AS `Indentation`,
IFNULL(N.`Last Date Modified`,“1753-01-01 00:00:00”) AS `Last Date Modified`,
IFNULL(N.`Automatic Ext_ Texts`,0) AS `Automatic Ext_ Texts`,
IFNULL(N.`Tax Area Code` ,"") AS `Tax Area Code`,
IFNULL(N.`Tax Liable`,0) AS `Tax Liable`,
IFNULL(N.`Tax Group Code` ,"") AS `Tax Group Code`,
IFNULL(N.`Exchange Rate Adjustment`,0) AS `Exchange Rate Adjustment`,
IFNULL(N.`Service Tax Group Code` ,"") AS `Service Tax Group Code`,
IFNULL(N.`FBT Group Code` ,"") AS `FBT Group Code`,
IFNULL(N.`Excise Prod_ Posting Group` ,"") AS `Excise Prod_ Posting Group`,
IFNULL(N.`Capital Item`,0) AS `Capital Item`,
IFNULL(N.`Group Balances`,0) AS `Group Balances`,
IFNULL(N.`Deferred G_L` ,"") AS `Deferred G_L`,
IFNULL(N.`TDS Applicable`,0) AS `TDS Applicable`,
IFNULL(Sum(X.`Debit Amount`),0) as `Debit Amount`,
IFNULL(Sum(X.`Credit Amount`),0) as `Credit Amount`,
IFNULL(Sum(X.`Net Change`),0) as `Net Change`,
IFNULL(Sum(X.`Balance`),0) as Balance,
IFNULL(Sum(X.`Balance at Date`),0) as `Balance at Date`,
IFNULL(Sum(X.`Add_-Currency Balance at Date`),0) as `Add_-Currency Balance at Date`,
IFNULL(Sum(X.`Additional-Currency Balance`),0) as `Additional-Currency Balance`,
IFNULL(Sum(X.`Additional-Currency Net Change`),0) as `Additional-Currency Net Change`
from NAV X
Join NAV N
On KEYS [X.No_]
Where N.`No_` = '122537’
Group by X.No_

and we even if I don’t do join here. Query is simply taking munutes ( 16m ) to execute.
I have Primary index created on No_
Please do let me know if you see something obvious here.
indexer is taking 200% cpu on 3 nodes and cbq-engine also takes 150% cpu

singh_siddhu · February 4, 2016, 12:48pm

My server is Version: 4.1.0-5005 Enterprise Edition (build-5005)
when I do ‘top’ I could see indexer is always running consuming 100% of cpu ? why is it so.

singh_siddhu · February 7, 2016, 5:48pm

Any solution guys ? Will really leave N1QL as solution if this is not going to give me sub second response.

vmx · February 7, 2016, 8:07pm

Hi Siddhartha,

I’m not good with N1QL, so I might be wrong, but it looks like you could also use a mapreduce view to solve your problem. I also don’t know what at Date does, so I don’t know if mapreduce views can do it. But for the rest, it would look like that:

The map function:

function(doc) {
    // 1 == Debit Amount
    // 2 == Credit Amount
    // 3 == Net Change
    // 4 == Balance
    // 5 == Additional-Currency Balance
    // 6 == Additional-Currency Net Change
    if (doc['Debit Amount'] !== undefined) {
        emit([doc.No_, 1], doc['Debit Amount']);
    }
    if (doc['Credit Amount'] !== undefined) {
        emit([doc.No_, 2], doc['Credit Amount']);
    }
    ...
}

Your reduce function would be _sum.

Your query parameters would then be:

?startkey=["122537",null]&endkey=["122537",{}]&group=true

Cheers,
Volker

geraldss · February 7, 2016, 8:55pm

Hi @singh_siddhu,

Can you tell us which document has No_ as a primary key, and which one has No_ as a foreign key? I masking about N and X. That is, do the N documents have No_ as primary key, or do the X documents have No_ as a primary key?

Gerald

singh_siddhu · February 8, 2016, 8:37am

In N it is primary key, and X it is foreign key that is why aggregate functions are applied on X’s column because there would be multiple rows for each Key of N in X . i.e., 1 : N ratio between N and X documents.

geraldss · February 8, 2016, 6:35pm

Ok, thank you. We are taking a look.

keshav_m · February 8, 2016, 7:30pm

Hi Siddhu,

You’re joining NAV X JOIN NAV N.
You have a filter on “NAV N”: WHERE N.No_ = ‘122537’
You have a primary index created on the bucket.

That means, the engine (query plan) will scan the full bucket NAV X and for each document, it looks for the field X.No_ and tries to match with “NAV N”.
The filter N.No_ = ‘122537’ is applies ONLY AFTER THE JOIN.

To improve the performance, you’ve to filter the documents early (during the scan of NAV X).

If X.No_ is the foreign key for N.No_ and the document ID has the same value,
1. You can simply apply the predicate to X.No.
2. Create a secondary index on NAV(No_)

CREATE INDEX NAVNo_ on NAV(No_);

If you have type field defined within each doc to differentiate between parent & child, you can can use that in the query and add WHERE clause to the index as well.

CREATE INDEX NAVNo_ on NAV(No_) WHERE type = ‘X’;
SELECT blah…
FROM NAV X JOIN NAV N ON KEYS [X.No_]
WHERE X.type = ‘X’ and X.No_ = 122537’
GROUP BY X.No_

singh_siddhu · February 9, 2016, 6:07am

I will try this as well. But didn’t understand this “and the document ID has the same value”, how can two different documents will have document ID same ? I already described earlier that keys are unique for each document.

singh_siddhu · February 9, 2016, 8:13am

Also how to make sure that out of 3 nodes in the cluster only 2 are data nodes and 1 is for query ?

singh_siddhu · February 9, 2016, 9:41am

Above mentioned help didn’t work out. It is taking 34 seconds for just 50K documents.

daschl · February 9, 2016, 9:57am

When you add a node to the cluster, you can choose which services you want to enable on it. There you can specify Data and/or Index and/or Query. For the rest of the performance related work I leave you in the capable hands of @keshav_m and @geraldss.

singh_siddhu · February 9, 2016, 10:19am

Thanks @daschl , this is a great help. Now I am going to do that first. Cheers

singh_siddhu · February 9, 2016, 2:03pm

So now I have 1 data node , 1 query node and 1 index node. Still I am getting result set which is too slow.
Total 50K documents, total time it took : 12.267031445s

There is no join now. Querying using only 1 type of document

Query :
select
X.No_ AS No_,
IFNULL(Sum(X.Balance),0) as Balance
from NAV X
Where X.Type = “GLEntry”
Group by X.No_

Indexes are :
a) CREATE PRIMARY INDEX NAV_No_ ON NAV USING GSI WITH {“nodes”:“172.16.47.62:8091”};
b) CREATE INDEX idx_GLE_Balance ON NAV(Balance) USING GSI WITH {“nodes”:“172.16.47.62:8091”};
c) CREATE INDEX idx_GLE_Type ON NAV(Type) WHERE (Type = “GLEntry”) USING GSI WITH {“nodes”:“172.16.47.62:8091”};

Query I am running using workbench :
./launch-cbq-gui.sh
Running: ./cbq-gui -datastore=http://172.16.47.60:8091 -localPort=:8094 -user= -pass=
Launching query web service.
Using CB Server at: http://172.16.47.60:8091
Using N1QL query service on: 172.16.47.60:8093
Using memcached service on: 172.16.47.60:11210
Using web content at: ./static
Launching UI server, to use, point browser at http://localhost:8094
Hit enter to stop server:

=====
Just wanted to know if this is the best I can get from N1QL ? Please let me know your input @geraldss @keshav_m.

geraldss · February 9, 2016, 3:35pm

@singh_siddhu, did you get on Skype with @keshav_m?

singh_siddhu · February 10, 2016, 5:40am

sorry don’t know his skype id. I have sent mine to him in personal message @geraldss, @keshav_m.

keshav_m · February 10, 2016, 6:19am

Hi @singh_siddhu,

I sent you another message yesterday with the info yesterday.
my skypeid is: rkeshavmurthy

Here’s the advice for you to get the best performance.

CREATE INDEX idx_gle_type_balance ON NAV(No_, Balance) WHERE (Type = ‘GLEntry’) USING GSI;

Then you issue:
select
X.No_ AS No_,
IFNULL(Sum(X.Balance),0) as Balance
from NAV X
Where X.Type = “GLEntry”
AND X.Balance IS NOT MISSING
AND X.No_ IS NOT MISSING
Group by X.No_;

Remember, in this case, you’re doing the full index scan followed by grouping & aggregation.
The cost of executing this query is proportional to number of entries in the index.

Please post the EXPLAIN when you post the results next.

singh_siddhu · February 10, 2016, 6:55am

Sure @keshav_m, will do that and let you know.

Topic		Replies	Views
N1QL Query Performance Issue - Declines on volume of data fetch SQL++	15	1105	December 15, 2019
Perfomance issues SQL++ query	7	2489	January 31, 2016
N1ql query is slow in join even indexes are created SQL++ query , n1ql	3	561	August 14, 2022
Slow N1QL performance on covered queries SQL++ query , n1ql , index	5	2115	August 8, 2017
Slow N1QL query Performance with a join SQL++	2	1543	January 14, 2022

Perfomance issue with N1QL self join

Related topics