Couchbase Analytics
Couchbase Data Platform now supports hybrid transaction/analytical processing (HTAP) workloads. Plus, our massively parallel processing (MPP) query engine lets users run ad-hoc analytical queries that deliver faster insights without impacting performance.
Couchbase Analytics reduces the time to insight on operational data by providing these key features:
Fast ingest: Make data available for analytical processing in milliseconds
NoETL for NoSQL: Process JSON data in its natural form without any transformation or schema design
Workload isolation: Run ad hoc queries without impacting application performance
Ad hoc querying: Business users can explore data and perform complex joins and aggregations
N1QL for analytics: It’s the first commercial implementation of the SQL++ language for querying schemaless semi-structured JSON data
Many of the popular visualization tools don’t natively support JSON integration but do support connectivity through standard technologies like JDBC, ODBC, or ADO.NET. With the CData drivers, any BI tool that supports SQL-based connectivity (like Tableau, Power BI, MicroStrategy, Excel, and more) will be able to visualize data stored in Couchbase Analytics.
As an example, I’ve enclosed a screenshot of a visualization in Tableau of data coming from Couchbase Analytics.
CData Drivers for Couchbase
The CData Drivers for Couchbase provide a SQL interface for NoSQL data sources, like Couchbase Analytics. Through innovative flattening techniques and a powerful, built-in SQL engine, CData drivers seamlessly translate SQL queries into SQL++, leveraging the server-side capabilities of Couchbase Analytics to provide real-time data for BI and analytics. This is especially critical for tools that generate SQL queries internally to request data from the connected data sources. For example, given a sample ‘cars’ document, you could expect Tableau to generate a SQL query. That SQL query will be translated to SQL++ by the CData driver.
Sample cars document
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
{ "cars":{ "EngineTemp":214, "TPMS":"ON", "TirePressure":[ 33, 36, 34, 34 ], "Indicators":{ "WindshieldWiper":"OFF", "Headlamp":"ON", "BrakePedal":"OFF", "Ignition":"OFF" }, "ABS":"OFF", "Location":{ "lat":34.25320048276987, "long":-117.33252478191595 }, "Fuel":88, "TransmissionGear":2, "AcceleratorPedalPosition":15, "VIN":"RU2S4T1TR1TR30UR", "ModelType":"Medium SUV", "OutsideTemp":150, "Speed":25, "EngineOil":36 } } |
Sample SQL Query
1 |
SELECT ModelType, COUNT(*) AS Total FROM cars GROUP BY ModelType; |
The SQL query is translated by the driver to SQL++.
Sample SQL++ Query
1 |
SELECT `cars`.`ModelType`, COUNT(*) AS `total` FROM `cars` GROUP BY `cars`.`ModelType`; |
While the differences between the SQL and SQL++ queries may seem small, they are exaggerated when working with tools that generate SQL queries based on a graphical user interface (like Tableau). Since the user has no control over the query generated, these differences mean that a SQL to SQL++ translation is not only critical, but necessary for performing BI, analytics, and reporting in SQL-based tools. Since the CData drivers translate to pure SQL++, they are able to leverage the powerful server-side processing capabilities of the Analytics service.
SQL to SQL++ Interface
How do the CData drivers create the SQL interface for JSON data? As we explain, we’ll refer to the above sample document.
Object Flattening
CData drivers flatten objects using a dot notation to represent the original hierarchical structure. For example, to access the WindshieldWiper field of the Indicator object, you would refer to the column cars.Indicator.WindshieldWiper. Many tools do not allow periods in column names, so where needed, the CData drivers will create temporary schema definitions in memory to enable connectivity from such tools. Below you’ll see an example SQL query, the corresponding SQL++ query, and the query results.
1 2 |
SQL: SELECT Indicators_WindshieldWiper FROM cars; SQL++: SELECT `cars`.`Indicators`.`WindshieldWiper` FROM `cars`; |
Array Flattening
CData drivers flatten arrays within JSON documents using 0-based indexing, similar to what you see in other programming languages. For example, to access the 1st and 2nd entries in the TirePressure array, you would refer to the columns cars.TirePressure.0 and cars.TirePressure.1. Below you’ll see an example SQL query, the corresponding SQL++ query, and the query results.
1 2 |
SQL: SELECT TirePressure_0, TirePressure_1 FROM cars; SQL++: SELECT `cars`.`TirePressure`[0], `cars`.`TirePressure`[1] FROM `cars`; |
As the Analytics service and SQL++ interfaces grow and evolve, so to will the CData drivers, implementing new query features as they are added and continuing to leverage the powerful data processing capabilities of the Analytics service to allow Couchbase users to quickly and easily work with, visualize, and report on their data.
Try Couchbase 6.0 and CData today
Download the new release of Couchbase and the latest CData drivers to start working with Couchbase data in the SQL-enabled tools and applications you already use. We look forward to your feedback on the Couchbase forums.
Acknowledgements
This blog post is a collaboration between Couchbase and CData. A big thank you to Jerod Johnson, Technology Evangelist at CData software, for his contribution to this blog post.