Hi,
I am suffering with the performance of N1QL query where I have a self join on a key which is indexed.
There are two kinds of documents (A & B say ) with in same bucket. Both has the common key to be used to join them. B is having some columns which are not in A and I am using aggregate functions ( sum for e.g ) on B’s document.
I have 2 million documents in 8GB each node ( 3 * 8 GB per cluster ). So there is enough space.
I am getting output in sever minutes like 22 minutes.
I have also index created on the join key.
What could be the issue.
Cheers,
Siddhartha
Please post the query. Thanks.
Query
select
IFNULL(N.No_
,"") AS No_
,
IFNULL(N.Name
,"") AS Name
,
IFNULL(N.Income_Balance
,0) AS Income_Balance,
IFNULL(N.Account Type
,0) AS AccountType,
IFNULL(N.Totaling
,"") AS Totaling
,
IFNULL(N.Gen_ Bus_ Posting Group
,"") AS Gen_ Bus_ Posting Group
,
IFNULL(N.Gen_ Prod_ Posting Group
,"") AS Gen_ Prod_ Posting Group
,
IFNULL(N.Consol_ Debit Acc_
,"") AS Consol_ Debit Acc_
,
IFNULL(N.Consol_ Credit Acc_
,"") AS Consol_ Credit Acc_
,
IFNULL(N.Consol_ Translation Method
,0) AS Consol_ Translation Method
,
IFNULL(N.Cost Type No_
,"") AS Cost Type No_
,
IFNULL(N.Default IC Partner G_L Acc_ No
,"") AS Default IC Partner G_L Acc_ No
,
IFNULL(N.Direct Posting
,0) AS Direct Posting
,
IFNULL(N.VAT Bus_ Posting Group
,"") AS VAT Bus_ Posting Group
,
IFNULL(N.VAT Prod_ Posting Group
,"") AS VAT Prod_ Posting Group
,
IFNULL(N.Search Name
,"") AS Search Name
,
IFNULL(N.Global Dimension 1 Code
,"") AS Global Dimension 1 Code
,
IFNULL(N.Global Dimension 2 Code
,"") AS Global Dimension 2 Code
,
IFNULL(N.Debit_Credit
,0) AS Debit_Credit
,
IFNULL(N.No_ 2
,"") AS No_ 2
,
IFNULL(N.Blocked
,0) AS Blocked
,
IFNULL(N.Reconciliation Account
,0) AS Reconciliation Account
,
IFNULL(N.New Page
,0) AS New Page
,
IFNULL(N.No_ of Blank Lines
,0) AS No_ of Blank Lines
,
IFNULL(N.Indentation
,0) AS Indentation
,
IFNULL(N.Last Date Modified
,“1753-01-01 00:00:00”) AS Last Date Modified
,
IFNULL(N.Automatic Ext_ Texts
,0) AS Automatic Ext_ Texts
,
IFNULL(N.Tax Area Code
,"") AS Tax Area Code
,
IFNULL(N.Tax Liable
,0) AS Tax Liable
,
IFNULL(N.Tax Group Code
,"") AS Tax Group Code
,
IFNULL(N.Exchange Rate Adjustment
,0) AS Exchange Rate Adjustment
,
IFNULL(N.Service Tax Group Code
,"") AS Service Tax Group Code
,
IFNULL(N.FBT Group Code
,"") AS FBT Group Code
,
IFNULL(N.Excise Prod_ Posting Group
,"") AS Excise Prod_ Posting Group
,
IFNULL(N.Capital Item
,0) AS Capital Item
,
IFNULL(N.Group Balances
,0) AS Group Balances
,
IFNULL(N.Deferred G_L
,"") AS Deferred G_L
,
IFNULL(N.TDS Applicable
,0) AS TDS Applicable
,
IFNULL(Sum(X.Debit Amount
),0) as Debit Amount
,
IFNULL(Sum(X.Credit Amount
),0) as Credit Amount
,
IFNULL(Sum(X.Net Change
),0) as Net Change
,
IFNULL(Sum(X.Balance
),0) as Balance,
IFNULL(Sum(X.Balance at Date
),0) as Balance at Date
,
IFNULL(Sum(X.Add_-Currency Balance at Date
),0) as Add_-Currency Balance at Date
,
IFNULL(Sum(X.Additional-Currency Balance
),0) as Additional-Currency Balance
,
IFNULL(Sum(X.Additional-Currency Net Change
),0) as Additional-Currency Net Change
from NAV X
Join NAV N
On KEYS [X.No_]
Where N.No_
= '122537’
Group by X.No_
and we even if I don’t do join here. Query is simply taking munutes ( 16m ) to execute.
I have Primary index created on No_
Please do let me know if you see something obvious here.
indexer is taking 200% cpu on 3 nodes and cbq-engine also takes 150% cpu
My server is Version: 4.1.0-5005 Enterprise Edition (build-5005)
when I do ‘top’ I could see indexer is always running consuming 100% of cpu ? why is it so.
Any solution guys ? Will really leave N1QL as solution if this is not going to give me sub second response.
Hi Siddhartha,
I’m not good with N1QL, so I might be wrong, but it looks like you could also use a mapreduce view to solve your problem. I also don’t know what at Date
does, so I don’t know if mapreduce views can do it. But for the rest, it would look like that:
The map function:
function(doc) {
// 1 == Debit Amount
// 2 == Credit Amount
// 3 == Net Change
// 4 == Balance
// 5 == Additional-Currency Balance
// 6 == Additional-Currency Net Change
if (doc['Debit Amount'] !== undefined) {
emit([doc.No_, 1], doc['Debit Amount']);
}
if (doc['Credit Amount'] !== undefined) {
emit([doc.No_, 2], doc['Credit Amount']);
}
...
}
Your reduce function would be _sum
.
Your query parameters would then be:
?startkey=["122537",null]&endkey=["122537",{}]&group=true
Cheers,
Volker
Hi @singh_siddhu,
Can you tell us which document has No_ as a primary key, and which one has No_ as a foreign key? I masking about N and X. That is, do the N documents have No_ as primary key, or do the X documents have No_ as a primary key?
Gerald
In N it is primary key, and X it is foreign key that is why aggregate functions are applied on X’s column because there would be multiple rows for each Key of N in X . i.e., 1 : N ratio between N and X documents.
Ok, thank you. We are taking a look.
Hi Siddhu,
You’re joining NAV X JOIN NAV N.
You have a filter on “NAV N”: WHERE N.No_ = ‘122537’
You have a primary index created on the bucket.
That means, the engine (query plan) will scan the full bucket NAV X and for each document, it looks for the field X.No_ and tries to match with “NAV N”.
The filter N.No_ = ‘122537’ is applies ONLY AFTER THE JOIN.
To improve the performance, you’ve to filter the documents early (during the scan of NAV X).
If X.No_ is the foreign key for N.No_ and the document ID has the same value,
1. You can simply apply the predicate to X.No.
2. Create a secondary index on NAV(No_)
CREATE INDEX NAVNo_ on NAV(No_);
If you have type field defined within each doc to differentiate between parent & child, you can can use that in the query and add WHERE clause to the index as well.
CREATE INDEX NAVNo_ on NAV(No_) WHERE type = ‘X’;
SELECT blah…
FROM NAV X JOIN NAV N ON KEYS [X.No_]
WHERE X.type = ‘X’ and X.No_ = 122537’
GROUP BY X.No_
I will try this as well. But didn’t understand this “and the document ID has the same value”, how can two different documents will have document ID same ? I already described earlier that keys are unique for each document.
Also how to make sure that out of 3 nodes in the cluster only 2 are data nodes and 1 is for query ?
Above mentioned help didn’t work out. It is taking 34 seconds for just 50K documents.
When you add a node to the cluster, you can choose which services you want to enable on it. There you can specify Data and/or Index and/or Query. For the rest of the performance related work I leave you in the capable hands of @keshav_m and @geraldss.
Thanks @daschl , this is a great help. Now I am going to do that first. Cheers
So now I have 1 data node , 1 query node and 1 index node. Still I am getting result set which is too slow.
Total 50K documents, total time it took : 12.267031445s
There is no join now. Querying using only 1 type of document
Query :
select
X.No_
AS No_
,
IFNULL(Sum(X.Balance
),0) as Balance
from NAV X
Where X.Type
= “GLEntry”
Group by X.No_
Indexes are :
a) CREATE PRIMARY INDEX NAV_No_
ON NAV
USING GSI WITH {“nodes”:“172.16.47.62:8091”};
b) CREATE INDEX idx_GLE_Balance
ON NAV
(Balance
) USING GSI WITH {“nodes”:“172.16.47.62:8091”};
c) CREATE INDEX idx_GLE_Type
ON NAV
(Type
) WHERE (Type
= “GLEntry”) USING GSI WITH {“nodes”:“172.16.47.62:8091”};
Query I am running using workbench :
./launch-cbq-gui.sh
Running: ./cbq-gui -datastore=http://172.16.47.60:8091 -localPort=:8094 -user= -pass=
Launching query web service.
Using CB Server at: http://172.16.47.60:8091
Using N1QL query service on: 172.16.47.60:8093
Using memcached service on: 172.16.47.60:11210
Using web content at: ./static
Launching UI server, to use, point browser at http://localhost:8094
Hit enter to stop server:
=====
Just wanted to know if this is the best I can get from N1QL ? Please let me know your input @geraldss @keshav_m.
@singh_siddhu, did you get on Skype with @keshav_m?
Hi @singh_siddhu,
I sent you another message yesterday with the info yesterday.
my skypeid is: rkeshavmurthy
Here’s the advice for you to get the best performance.
- CREATE INDEX idx_gle_type_balance ON NAV(No_, Balance) WHERE (Type = ‘GLEntry’) USING GSI;
Then you issue:
select
X.No_ AS No_,
IFNULL(Sum(X.Balance),0) as Balance
from NAV X
Where X.Type = “GLEntry”
AND X.Balance IS NOT MISSING
AND X.No_ IS NOT MISSING
Group by X.No_;
Remember, in this case, you’re doing the full index scan followed by grouping & aggregation.
The cost of executing this query is proportional to number of entries in the index.
Please post the EXPLAIN when you post the results next.
Sure @keshav_m, will do that and let you know.