I’m excited to announce two new powerful data analysis capabilities now available as part of Couchbase Analytics: 1) ROLLUP
aggregation, and 2) CUBE
functionality, both to help you aggregate data across multiple document attributes.
The addition of the ROLLUP
aggregation functionality to the Couchbase Analytics Service is part of the Couchbase Server 7.0 release. (We’ll cover the CUBE
functionality in an article next week.)
ROLLUP Aggregation Example: Ecommerce Data
The ROLLUP
aggregation function is an extension of the GROUP BY clause.
The ROLLUP
sub-clause allows you to include extra rows – commonly referred to as super-aggregate rows – that represent subtotals in your data along with the grand total row.
For example, let’s say you have an analytics collection called Sales that tracks ecommerce information for your business. The key data elements you’re most interested in tracking are country, region/state, product, quantity and price. (And let’s say you calculate Sales as Sales = quantity * price.)
Below is a subset of our example ecommerce data collection:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
{ "Country": "US", "Region": "California", "Product": "Rubber Keyboard", "Qty": 1, "Price": 35.5 }, { "Country": "US", "Region": "Colorado", "Product": "Gloves", "Qty": 4, "Price": 6.95 }, { "Country": "US", "Region": "Connecticut", "Product": "Cotton Ties", "Qty": 2, "Price": 12.75 } … |
Now, let’s say your users need to determine the following information using the data above:
1) Total sales for all countries and regions
2) Total sales for all regions by country
3) Total sales for each region
The ROLLUP
aggregation sub-clause is an excellent choice for answering these sorts of data analysis questions. Your data engineer or data analyst simply uses the N1QL query shown below.
1 2 3 4 5 6 7 8 |
SELECT IFNULL(country,"All Countries") Country, IFNULL(region,"All Regions") Region, ROUND(SUM(o.qty * o.price),0) Sales FROM orders o WHERE region LIKE “C%” LET country = o.country, region = o.region GROUP BY ROLLUP(country, region) ORDER BY country ASC, region ASC, Sales DESC; |
In the above N1QL query, we use country and region as variables. The ROLLUP
sub-clause assumes a hierarchy among the specified data attributes.
For example, if the input column is (country, region)
, the hierarchy country
is higher or greater than region
. This is why it makes sense to use ROLLUP
to generate the subtotals and the grand total for reporting purposes.
Based on the query above, ROLLUP(country, region)
generates the following results:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
{ "Country": "All Countries", "Region": "All Regions", "Sales": 19921991 }, { "Country": "US", "Region": "All Regions", "Sales": 199219915 }, { "Country": "US", "Region": "California", "Sales": 6610804 }, { "Country": "US", "Region": "Colorado", "Sales": 6569542 } { "Country": "US", "Region": "Connecticut", "Sales": 6741644 } ... |
Note that the output is exactly what your example users were originally looking for:
- At the top of the results, a grand total summary JSON field shows the total sales of all countries and regions. In these rows, the NULL values in the country and region attributes are transformed to “All Countries” and “All Regions” respectively to make the output more readable.
- After that, for each country (in this case, the U.S.), an extra subtotal summary JSON data element appears displaying the total sales. In these, values in the region attribute is transformed from NULL to “All Regions”
- Lastly, the output shows the total sales rolled up for each region within the country. In this example, sales for California, Colorado and Connecticut are shown within the U.S.
Conclusion
This has been just a quick look at the new ROLLUP
aggregation feature in the Couchbase Analytics Service. For more information, check out the ROLLUP
documentation. I hope you find this new capability useful for future data analysis projects at your enterprise.
I look forward to your feedback on the Couchbase Forums – let us know what you think!