Hi again,
This weekend I have taken the time to test the Couchbase Server 4.5 DP.
In my development environment I set up 3 machines with the same hardware and software and different Couchbase version / Configuration.
Machine 1 – Couchbase Server 4.0 CE
Machine 2 – Couchbase Server 4.5 Enterprise DP – Global Indexes
Machine 3 – Couchbase Server 4.5 Enterprise DP – With Memory-Optimized Indexes
I put on each machine exactly the same documents.
The document structure is:
{
“notificationId”: “0191aea8-e181-4c8a-a45a-5cdd7a1430e4”,
“logId”: “6bdea683-ae6c-421c-b604-cdc4a223fa3a”,
“createdDate”: “2016-03-18T18:19:24.8221794Z”,
“action”: 3,
“subActions”: [
2,
3
],
“operation”: 2,
“propertyId”: 1881,
“propertyName”: “Some Name”,
“channelId”: 379,
“dates”: [
{
“dateFrom”: “2016-03-18T00:00:00”,
“dateTo”: “2016-03-31T00:00:00”
}
],
“rates”: “rate 1”,
“rooms”: “Q1”,
“prices”: “110.00€”,
“allotment”: “100”,
“closeSales”: 0,
“readedBy”: null,
“isReaded”: false,
“readedDate”: null
}
To perform the count test I choose the channelId property.
The number of document for channelIds is:
Id 1000 -> 100000;
Id 1001 -> 200000;
Id 1002 -> 300000;
Id 1003 -> 400000;
Total Number of documents is 1000000.
Every document was in memory.
Then I run the following queries:
Machine 1:
Explain Select count(1) from PortalEvents where channelId = 1000;
{
“requestID”: “869914b1-5cba-42cf-9729-6c8c31801de9”,
“signature”: “json”,
“results”: [
{
"#operator": “Sequence”,
"~children": [
{
"#operator": “IndexScan”,
“index”: “portalevents_channelId_propertyId”,
“keyspace”: “PortalEvents”,
“limit”: 9.223372036854776e+18,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
“successor(1000)”
],
“Inclusion”: 1,
“Low”: [
“1000”
]
},
“Seek”: null
}
],
“using”: “gsi”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Fetch”,
“keyspace”: “PortalEvents”,
“namespace”: “default”
},
{
"#operator": “Filter”,
“condition”: “((PortalEvents
.channelId
) = 1000)”
},
{
"#operator": “InitialGroup”,
“aggregates”: [
“count(1)”
],
“group_keys”: []
}
]
}
},
{
"#operator": “IntermediateGroup”,
“aggregates”: [
“count(1)”
],
“group_keys”: []
},
{
"#operator": “FinalGroup”,
“aggregates”: [
“count(1)”
],
“group_keys”: []
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “count(1)”
}
]
},
{
"#operator": “FinalProject”
}
]
}
}
]
}
],
“status”: “success”,
“metrics”: {
“elapsedTime”: “2.872714ms”,
“executionTime”: “2.804859ms”,
“resultCount”: 1,
“resultSize”: 3061
}
}
Machine 2/3:
Explain Select count(1) from PortalEvents where channelId = 1000;
[
{
"#operator": “Sequence”,
"~children": [
{
"#operator": “IndexScan”,
“covers”: [
“cover ((PortalEvents
.channelId
))”,
“cover ((PortalEvents
.propertyId
))”,
“cover ((meta(PortalEvents
).id
))”
],
“index”: “portalevents_channelId_propertyId”,
“keyspace”: “PortalEvents”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
“successor(1000)”
],
“Inclusion”: 1,
“Low”: [
“1000”
]
}
}
],
“using”: “gsi”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Filter”,
“condition”: “(cover ((PortalEvents
.channelId
)) = 1000)”
},
{
"#operator": “InitialGroup”,
“aggregates”: [
“count(1)”
],
“group_keys”: []
}
]
}
},
{
"#operator": “IntermediateGroup”,
“aggregates”: [
“count(1)”
],
“group_keys”: []
},
{
"#operator": “FinalGroup”,
“aggregates”: [
“count(1)”
],
“group_keys”: []
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “count(1)”
}
]
},
{
"#operator": “FinalProject”
}
]
}
}
]
}
]
Then I execute the following queries 11 times each and get the average time.
Note: I removed the first query time, because it was a lot slower than the other 10.
Machine 1:
Select count(1) from PortalEvents where channelId = 1000; - Average 11.24s
Select count(1) from PortalEvents where channelId = 1001; - Average 23.83s
Select count(1) from PortalEvents where channelId = 1002; - Average 34.42s
Select count(1) from PortalEvents where channelId = 1003; - Average 51.30s
Machine 2:
Select count(1) from PortalEvents where channelId = 1000; - Average 1.488s
Select count(1) from PortalEvents where channelId = 1001; - Average 2.907s
Select count(1) from PortalEvents where channelId = 1002; - Average 4.237s
Select count(1) from PortalEvents where channelId = 1003; - Average 5.435s
Machine 3:
Select count(1) from PortalEvents where channelId = 1000; - Average 1.087s
Select count(1) from PortalEvents where channelId = 1001; - Average 2.115s
Select count(1) from PortalEvents where channelId = 1002; - Average 3.25s
Select count(1) from PortalEvents where channelId = 1003; - Average 4.108s
All the machines are virtual machines and does not have the production resources, so we will focus on percentage only.
Results:
From Machine 1 (CB 4.0 CE) To Machine 2 (CB 4.5 Enterprise DP) we can see an average improvement of 87.91%.
From Machine 2 (CB 4.5 Enterprise DP) To Machine 3 (CB 4.5 Enterprise DP – Memory-Optimized Index) we can see an average improvement of 25.48%.
As we can see from Coucbhase Server 4.0 CE to Couchbase Server 4.5DP there is a great improvement.
My production environment was taking 5.30s to count 90000 documents on 9000000. With the new version it will take +/- 600ms, if my math are correct. This is very good news.
What I didn’t like to see was that:
From Machine 2 channelId 1000 (count return 100000) to Machine 2 channelId 1001 (count return 200000) we can see an increase in 95.36%
From Machine 2 channelId 1001 (count return 200000) to Machine 2 channelId 1002 (count return 300000) we can see an increase in 45.75%
From Machine 2 channelId 1002 (count return 300000) to Machine 2 channelId 1003 (count return 400000) we can see an increase in 28.27%
We probably need a bigger case test with more documents, but from this we can see that the count time increase substantially when number of documents increase.
Do you have some more numbers on this? Is this being taking in consideration in the 4.5 version?
Can I expect more improvements on the count query?
I’m really happy with the improvement on the new version, is there a release date for the Community Edition for the 4.5 version?
Thank you
Best Regards