Hi
In my configuration i have a cluster with 5 server nodes and i have a question:
- It suggests that for every node server has an index #primary?
- My queries use an index ("ex:CREATE INDEX ON indexTest tests (
field1
,field2
) USING GSI))
It 'also recommended that this index is on all nodes in the cluster?
There are tips to optimize n1ql and make them scalable on all node server?
Thanks for the information
Hello Alessandro,
If you have a 5 node cluster. The idea to scale up isoleting the index and query services from DATA nodos, using the MDS Couchbase option.
If you have a high number of querys using the index CREATE INDEX ON indexTest tests ( field1,field2) USING GSI)), you can duplicate this index at two nodes or more, to balance the index lookup Access and HA.
There is some more factors to decide this. By example, query cardinality, that is , the number of rows the query process, concurrency, data change frequency and so on.
I think that if you querys use field1, field2 , primary index is not necesary , at least you query by other atrributes, that are different of filed1 , field2.
And , to tune SQL the best, is to write fine querys, that Access to very specific data. You have the Explain command to see the execution plan to see how Couchbase execute the query.
Regards
Arturo
Thanks for the info,
I use Couchbase 4.0 Community Edition, and i can use MDS Couchbase option?
My system has about 20 million entries and queries are used to perform filtering on two different fields with a group by and ordered value
Thanks again for your response and the advice your
Hi,
According with this blog entry : http://blog.couchbase.com/2016/june/couchbase-server-editions-explained-open-source-community-edition-and-enterprise-edition
MDS, is a Enterprise Feature.
…
You can use this query to diag N1QL response times.
select * from system:completed_requests;
or
select * from system:active_requests;
This systems key spaces provide information about all N1QL request and thei execution metrics.
Then you can query this info a analyze the N1ql statements with more elapsed time to see the N1QL text , request EXPLAIN information and see index definition.
There is a lot of people that can help you to improve N1QL performance in this foro.
I hope help.
Arturo
1 Like
arturo.gutierrez.gom ,thank you very much!!!
If increase in the cluster to another node, the query time is lowered?
Now i have an Execution of 48second…
Hello,
Could you share you N1QL text and execution plan of you query?
You can get the execution plan, with simply put explain before the N1QL, and the the engine display the info.
Example:
explain select … from …
Arturo
Example for my use:
explain SELECT SUM(operation), (user) FROM bucketTime WHERE enviroment = “/root” AND startTime > 1356998400000 AND endTime < 1483228799999 GROUP BY user ORDER BY operation DESC
[
{
"#operator": "Sequence",
"~children": [
{
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan",
"index": "indexTest",
"keyspace": "bucketTime",
"limit": 9223372036854776000,
"namespace": "default",
"spans": [
{
"Range": {
"High": [
"successor(\"/root\")"
],
"Inclusion": 0,
"Low": [
"\"/root\"",
"1.3569984e+12"
]
},
"Seek": null
}
],
"using": "gsi"
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Fetch",
"keyspace": "bucketTime",
"namespace": "default"
},
{
"#operator": "Filter",
"condition": "((((`bucketTime`.`enviroment`) = \"/root\") and (1.3569984e+12 < (`bucketTime`.`startTime`))) and ((`bucketTime`.`endTime`) < 1.483228799999e+12))"
},
{
"#operator": "InitialGroup",
"aggregates": [
"sum((`bucketTime`.`operation`))"
],
"group_keys": [
"(`bucketTime`.`user`)"
]
}
]
}
},
{
"#operator": "IntermediateGroup",
"aggregates": [
"sum((`bucketTime`.`operation`))"
],
"group_keys": [
"(`bucketTime`.`user`)"
]
},
{
"#operator": "FinalGroup",
"aggregates": [
"sum((`bucketTime`.`operation`))"
],
"group_keys": [
"(`bucketTime`.`user`)"
]
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "InitialProject",
"result_terms": [
{
"expr": "sum((`bucketTime`.`operation`))"
},
{
"expr": "(`bucketTime`.`user`)"
}
]
}
]
}
}
]
},
{
"#operator": "Order",
"sort_terms": [
{
"desc": true,
"expr": "(`bucketTime`.`operation`)"
}
]
},
{
"#operator": "FinalProject"
}
]
}
]
Hi,
Can you post the Indextest definition ?
Definition: CREATE INDEX indexTest ON bucketTime(enviroment
,startTime
,endTime
) USING GSI
Hello,
I,m not sure but reading the execution plan seems like the index scan only filter by first column (enviorement). After that fetch all docs from bucket and filter other conditions.
Can you try créate a index only on starttime and endtime Nevada filter by enviorment.
Create index i2 on your-bucket (starttime,endtime) where enviorment ="/root" using gsi;
After that you can execute the query to Get ir improve the response time and Get explain plan to check if is used the new index.
You can forcé the use of this index using Use index at query level.
Hi
I try as you write:
- create a index with start and end time :
CREATE INDEX indexTest ON bucketTime(startTime
,endTime
) WHERE (enviroment
= “/root”) USING GSI
- execute the query with using this index
but the result is the same executione 48 sec…
explain SELECT SUM(operation) ,user
FROM bucketTime USE INDEX (indexTest USING GSI)
WHERE enviroment = “/root”
AND startTime > 1356998400000
AND endTime < 1483228799999
GROUP BY user
ORDER BY INDEX DESC
result:
[
{
"#operator": "Sequence",
"~children": [
{
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan",
"index": "indexTest",
"keyspace": "bucketTime",
"limit": 9223372036854776000,
"namespace": "default",
"spans": [
{
"Range": {
"High": null,
"Inclusion": 0,
"Low": [
"1.3569984e+12"
]
},
"Seek": null
}
],
"using": "gsi"
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Fetch",
"keyspace": "bucketTime",
"namespace": "default"
},
{
"#operator": "Filter",
"condition": "((((`bucketTime`.`enviroment`) = \"/root\") and (1.3569984e+12 < (`bucketTime`.`startTime`))) and ((`bucketTime`.`endTime`) < 1.483228799999e+12))"
},
{
"#operator": "InitialGroup",
"aggregates": [
"sum((`bucketTime`.`operation`))"
],
"group_keys": [
"(`bucketTime`.`user`)"
]
}
]
}
},
{
"#operator": "IntermediateGroup",
"aggregates": [
"sum((`bucketTime`.`operation`))"
],
"group_keys": [
"(`bucketTime`.`user`)"
]
},
{
"#operator": "FinalGroup",
"aggregates": [
"sum((`bucketTime`.`operation`))"
],
"group_keys": [
"(`bucketTime`.`user`)"
]
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "InitialProject",
"result_terms": [
{
"expr": "sum((`bucketTime`.`operation`))"
},
{
"expr": "(`bucketTime`.`user`)"
}
]
}
]
}
}
]
},
{
"#operator": "Order",
"sort_terms": [
{
"desc": true,
"expr": "(`bucketTime`.`operation`)"
}
]
},
{
"#operator": "FinalProject"
}
]
}
]
Ok.
How many docs return :
Select count(*) from bucket where enviorment ="/root" and starttime etc etc .
What response time has this query?
This query is execute in :47.20s
and return: 370163 docs
Hello,
You have execute this query?
SELECT COUNT(*)
FROM bucketTime USE INDEX (indexTest USING GSI)
WHERE enviroment = "/root"
AND startTime > 1356998400000
AND endTime < 1483228799999 ;
Is extrange the long time to execute, because the query is covered by the index.
Can you put the query and the execution plan?
Thanks
Query:
SELECT COUNT(*)
FROM bucketTime USE INDEX (indexTest USING GSI)
WHERE enviroment = “/root”
AND startTime > 1356998400000
AND endTime < 1483228799999 ;
Index:
CREATE INDEX indexTest ON bucketTime(startTime
,endTime
) WHERE (enviroment
= “/root”) USING GSI
Execution: 44.15s
“$1”: 370304
Explain:
> [
> {
> “#operator”: “Sequence”,
> “~children”: [
> {
> “#operator”: “IndexScan”,
> “index”: “indexTest”,
> “keyspace”: “bucketTime”,
> “limit”: 9223372036854776000,
> “namespace”: “default”,
> “spans”: [
> {
> “Range”: {
> “High”: null,
> “Inclusion”: 0,
> “Low”: [
> “1.3569984e+12”
> ]
> },
> “Seek”: null
> }
> ],
> “using”: “gsi”
> },
> {
> “#operator”: “Parallel”,
> “~child”: {
> “#operator”: “Sequence”,
> “~children”: [
> {
> “#operator”: “Fetch”,
> “keyspace”: “bucketTime”,
> “namespace”: “default”
> },
> {
> “#operator”: “Filter”,
> “condition”: “((((bucketTime
.enviroment
) = "/root") and (1.3569984e+12 < (bucketTime
.startTime
))) and ((bucketTime
.endTime
) < 1.483228799999e+12))”
> },
> {
> “#operator”: “InitialGroup”,
> “aggregates”: [
> “count()"
> ],
> “group_keys”: []
> }
> ]
> }
> },
> {
> “#operator”: “IntermediateGroup”,
> “aggregates”: [
> "count()”
> ],
> “group_keys”:
> },
> {
> “#operator”: “FinalGroup”,
> “aggregates”: [
> “count()"
> ],
> “group_keys”: []
> },
> {
> “#operator”: “Parallel”,
> “~child”: {
> “#operator”: “Sequence”,
> “~children”: [
> {
> “#operator”: “InitialProject”,
> “result_terms”: [
> {
> “expr”: "count()”
> }
> ]
> },
> {
> “#operator”: “FinalProject”
> }
> ]
> }
> }
> ]
> }
> ]
Thanks again for the support!!
Hello,
I’ve reproduced a similar query at my enviorement.
Bucket HR with 229380 docs.
Query:
select count(*) from HR where type=“emp2” and deptno=10 and sal >1000;
TYPE=emp2 return 98% of docs
Deptno=10 return 33% of docs
Sal> 1000 return 20 %.
If I execute the query, it count about 50.000 docs in few ms.
And the execution plan is:
cbq> explain select count(*) from HR where type=“emp2” and deptno=10 and sal >1000;
{
“requestID”: “038a640c-fa14-4d91-8786-f0f6f22e514a”,
“signature”: “json”,
“results”: [
{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “IndexScan”,
“covers”: [
“cover ((HR
.type
))”,
“cover ((HR
.deptno
))”,
“cover ((HR
.sal
))”,
“cover ((meta(HR
).id
))”
],
“index”: “i2_type_deptno_sal”,
“index_id”: “afaa244b1fc36b38”,
“keyspace”: “HR”,
“namespace”: “default”,
“spans”: [
…
Here you can see the operator
"#operator": “IndexScan”,
“covers”: [
“cover ((HR
.type
))”,
“cover ((HR
.deptno
))”,
“cover ((HR
.sal
))”,
"cover ((meta(HR
).id
))"
That filter from the index the where conditions, while in you Plan not.
This is a big difference because, you query need to fetch 300K docs from the index and do the Count.
I think that the best would be upgrade you 4.0 to 4.5.1 reléase that is the last Couchbase Production reléase.
In my case is 4.5 EE.
The upgrade is very easy, stop the Couchbase service and do the upgrade of the couchbase package that you download from the web.
Regards
Arturo
Thanks for the advice, but I currently use the comunity edition.
The performance problem so it may depend on precisely this
I have 5 nodes in my cluster server where every server node has enabled:
Data
Index
Query
Add two machines where it is enabled only query and index would improve its performance?
Thanks for the answer
Hello,
If you add more nodes you can scale to more requests.
But the query : SELECT COUNT(*)
FROM bucketTime USE INDEX (indexTest USING GSI)
WHERE enviroment = "/root"
AND startTime > 1356998400000
AND endTime < 1483228799999;
would be similar performance in terms of response time. (I think).
Is important have a very fast networking between the nodes . To improve communication performance between data nodes and index/query nodes this is a key factor. Try to use 10gb Ethernet.
You can add more RAM memory to your bucket to increase memory hit, and for the index servers to avoid paging at index level.
Also, the query touch a hig number of docs (>300k) is a large query.
is not posible to filter more?
I dont’t know if CE 4.1 has any optimizer improvement for this query.
May be you can post a question at CB Server foro.
Also you can explore the use of Views that are supported with CE.
Regards
Arturo
thanks a lot for availability and the info
Do you think you can transform this query in a map reduce?
SELECT SUM(operation) ,user
FROM bucketTime USE INDEX (indexTest USING GSI)
WHERE enviroment = “/root”
AND startTime > 1356998400000
AND endTime < 1483228799999
GROUP BY user
ORDER BY INDEX DESC
Couchbase works a lot better on map-reduce