Data analytics capabilities in Couchbase just got more robust with the 7.0 release.
I’m excited to announce two new powerful capabilities now available as part of Couchbase Analytics: 1) the CUBE
functionality, and 2) ROLLUP
aggregation, both to help you aggregate data across multiple document attributes.
The addition of the CUBE
extension functionality to the Couchbase Analytics Service is part of the Couchbase Server 7.0 release. (We covered the ROLLUP
functionality in an article last week.)
A Walkthrough Example of the New CUBE Functionality
As in SQL, the CUBE aggregation capability is an extension of the GROUP BY clause.
The CUBE extension sub-clause allows you to generate subtotals for all combinations of grouping columns specified in the GROUP BY clause. Specifically, the result includes extra rows representing subtotals in your data – commonly referred to as super-aggregate rows – along with the usual grand total row.
For example, let’s say you have a sample analytics collection called Orders that tracks ecommerce information. The key data elements you’re most interested in tracking are region/state, order date, quantity and price. (And let’s say you calculate Sales as Sales = quantity * price.)
Below is a sample of the ecommerce data for our example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
{ "orderDate": "2020-11-30", "region": "California", "qty": 5, "price": 596.37 }, { "orderDate": "2020-05-08", "region": "California", "qty": 3, "price": 814.69 }, { "orderDate": "2020-03-17", "region": "Connecticut", "qty": 2, "price": 270.04 }, { "orderDate": "2020-03-11", "region": "Colorado", "qty": 4, "price": 795.73 } |
Now, let’s say your business users would like to know the following from your ecommerce data:
1) Total sales for all years by region
2) Total sales for all regions by year
3) Total sales for each region by year
4) Total sales for all regions and all years
The CUBE extension is a good use case to answer these aggregation-based data analysis questions. Your data engineer or data analyst simply uses the N1QL query shown below within the Couchbase Analytics Service.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
-- Generates four grouping sets: -- 1. All Region & All Years (Grand Total) -- 2. All Regions, Year -- 3. All Years, Region -- 4. Region, Year SELECT IFNULL(region,"All Regions") Region, IFNULL(year, "All Years") Year, ROUND(SUM(o.qty * o.price),0) Sales FROM orders o LET year = DATE_PART_STR(o.orderDate, "year"), region = o.region GROUP BY CUBE(region, year) ORDER BY region, year |
In the above N1QL query, we’ve used year
and region
as variables for convenience. As a result, CUBE(region, year)
generates the following query results for our sample data:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
{ "Region": "All Regions", "Year": "All Years", "Sales": 3372854458}, { "Region": "All Regions", "Year": 2020, "Sales": 3371122342}, { "Region": "All Regions", "Year": 2021, "Sales": 1732116}, { "Year": "All Years", "Region": "Alabama", "Sales": 66383297}, { "Year": "All Years", "Region": "California", "Sales": 66108045}, { "Region": "California", "Year": 2020, "Sales": 66053233}, ... |
Note that the output is exactly what your users were originally looking for:
- At the top of the results, a grand total shows the total sales over all years and regions. In the query output, the
NULL
values in the country and region attributes are transformed to “All Years” and “All Regions” respectively to make the output more readable. - Then, you’ll see the result of another subtotal summarizing All Regions by each Year, e.g., 2020 and 2021, displaying the total sales. In these JSON fields, values in the
region
attribute are transformed fromNULL
to “All Regions”. - After that, we see another subtotal summary for “All Years” displaying the total sales for each region, e.g., Alabama & California. In these fields, values in the
region
attribute are transformed fromNULL
to “All Regions”. - Lastly, the output shows the total sales rolled up for each region for each year. In this example, Sales for California for the Year 2020
Conclusion
I hope you are excited about this new aggregation CUBE capability that extends the powerful analytical expressions available to you in Couchbase Analytics. For more information, check out the CUBE
documentation. I hope you find this new capability useful for future data analytics projects at your enterprise.
I look forward to your feedback on the Couchbase Forums – let us know what you think!