Business applications have requirements: take customer orders, deliver customer orders, track shipping, generate inventory report, end of the day/month/quarter business report, generate business dashboards and more. These requirements evolve slowly. They remain even when you choose a NoSQL database.
On NoSQL databases, challenges are addressed by a multitude of technologies and workarounds. Here are some of them:
- Copying the data with a different key to make the scans easier.
- Fetch all of the data needed to the application and then generate reports
- Load the data into relational databases to generate the report.
- Products provide workarounds with map-reduce views, APIs, etc.
- Finally, SQL itself has been unreasonably effective for both structured and semi-structured data. NoSQL databases like Couchbase, Cassandra, CosmosDB have extended SQL for JSON and wide-column data model.
NoSQL has evolved from “NO SQL” to Not Only SQL. If you’re interested in some of the evolutionary background combined with a history of SQL evolution to support semi-structured data, I suggest the following interviews.
- Ravi Mayuram’s interview with Don Chamberlin[2017]: https://youtu.be/-U_UjqnhMBI?t=3492
- A panel discussion between Ravi Mayuram, Don Chamberlin and Prof. Mike Carey [2018]: https://www.youtube.com/watch?v=LAlDe1w7wxc
A lot of NoSQL databases claim “SQL Support”. The SQL standard is wide and deep, covered in nine voluminous books. No one, neither Oracle nor SQL Server, supports everything in the standard despite decades of work. So, the NoSQL databases have a long way to go in catching up. So, a detailed evaluation of the SQL support is worth your while.
Here are the criteria you should use to evaluate the SQL support in NoSQL databases.
- Language support: What statements, data types, operations (joins, grouping, aggregation, windowing, pagination, etc)
- Indexing support: Indexes are key to performance, especially in the workloads for interactive applications.
- Optimizer: Query rewrite, choosing the right access path, creating the optimal query execution path is what makes SQL a successful 4GL. Some have a rule-based optimizer, some have a cost-based optimizer, some others have both. Evaluating the quality of the optimizer is critical. Typical benchmarks (TPC-C, TPC-DS, YCSB, YCSB-JSON) won’t help you here.
- As the saying goes: ” There are three things important in databases: performance, performance, and performance”. It’s important to measure the performance of your workload. YCSB and the extended YCSB-JSON will make this evaluation easier.
- SDKs: Rich SDKs and language support speed up your development.
- BI tool support: For large data analysis, support from BI tools usually via standard database connectivity drivers is important.
In this article, I’ll compare and contrast the SQL language support in Cassandra, CosmosDB, Couchbase, and MongoDB via their respective implementations. To be sure, MongoDB doesn’t support SQL but have some comparative commands.
I’ve divided the analysis into multiple sections. WordPress formatting makes these tables too big. Here is a PDF version that’s compact and easy to read. [Click on the image to view the PDF]
SQL Support approach:
SQL | SQL is a declarative language, select-join-project operations serving as the foundation. |
Cassandra | CQL: SQL inspired language for Cassandra. |
CosmosDB | Supports SQL along with MongoDB API, Gremlin (for the graph), etc. Simple select-from-where-order-by support. |
Couchbase | N1QL: SQL for JSON.
Couchbase has two implementations of N1QL: query service and analytics service. |
MongoDB | Javascript based, simplified SQL command based query. |
INPUT and OUTPUT
SQL | INPUT: Set of rows (tuples)
OUTPUT: A set of rows (tuples) |
Cassandra | INPUT: Sets of rows
OUTPUT: Set of rows |
CosmosDB | INPUT: Sets of JSON
OUTPUT: Set of JSON |
Couchbase | INPUT: Sets of JSON
OUTPUT: Set of JSON |
MongoDB | INPUT: Sets of JSON
OUTPUT: Set of JSON |
SELECT: FROM Clause
SQL | Specifies the datasource tables (relations) |
Cassandra | FROM clause with only one table allowed. Now joins, subqueries or expressions are allowed. From clause interpretation is same as SQL. |
CosmosDB | FROM clause supports a single collection and self joins (same as UNNEST in Couchbase). |
Couchbase | FROM clause with multiple keyspaces (subset of bucket), subqueries, expressions. Same as SQL |
MongoDB | db.t1.find() for single table SELECT.
db.t1.aggregate() is used as the generalized query framework. aggregate() can join with additional collections using $lookup operator. There can be multiple $lookup operators in the aggregation pipeline framework making it the closest cousin of the SQL FROM clause. |
SELECT: WHERE Clause
SQL | Criteria for selecting a row |
Cassandra | Standard boolean expressions. No subqueries. |
CosmosDB | Same as SQL |
Couchbase | Standard boolean expressions and subqueries. |
MongoDB | db.t1.find({x:10});
aggregate() has the $match clause. |
SELECT: SELECT Clause
SQL | Projection clause |
Cassandra | SELECT clause is same as SQL. |
CosmosDB | SELECT clause is same as SQL. |
Couchbase | SELECT clause is same as SQL. |
MongoDB | db.t1.find({x:10}, {a:1, b:1})
$project operator in the aggregation pipeline |
SELECT: CTE – Common Table Expression
SQL | Dynamically defined data source (table, resultset) |
Cassandra | Unsupported |
CosmosDB | Unsupported |
Couchbase | WITH clause; same as SQL (in v6.5). Recursive CTE is unsupported |
MongoDB | Unsupported |
SELECT: Subquery
SQL | Subquery: Subqueries in the FROM clause, WHERE clause, anywhere an expression is allowed. |
Cassandra | Unsupported |
Cosmosdb | Unsupported |
Couchbase | Supports both correlated and non-correlated subqueries. |
MongoDB | Unsupported in find(). Can add $match in the pipeline, but not exactly an equivalent of a subquery. |
SELECT: GROUP BY
SQL | Group the rows based on one or more expressions. Quite useful in reporting and aggregation for groups. |
Cassandra | Supported; Same as SQL. |
Cosmosdb | Unsupported. Can only do the aggregation on the whole resultset. |
Couchbase | Supported; Similar as SQL. |
MongoDB | $group operator in aggregate() pipeline |
SELECT: HAVING clause
SQL | Filtering after the aggregation. |
Cassandra | Unsupported |
CosmosDB | Unsupported |
Couchbase | HAVING clause; Same as SQL |
MongoDB | $match after the grouping and aggregation. |
SELECT: ORDER BY Clause
SQL | The final order of the results produced by the query block |
Cassandra | ORDER BY clause; Same as SQL. |
Cosmosdb | ORDER BY clause; Same as SQL. |
Couchbase | ORDER BY clause; Same as SQL. |
MongoDB | db.t1.find().sort({a:1, b:-1});
aggregate() has $sort to specify the result order. |
SELECT: LIMIT, OFFSET Clause
SQL | Used for pagination of the resultset |
Cassandra | “LIMIT is supported.
OFFSET is unsupported.” |
CosmosDB | LIMIT (TOP) and OFFSET clause; Similar to SQL |
Couchbase | LIMIT and OFFSET clause; Same as SQL |
MongoDB | skip(), limit() methods with find(). $offset, $limit with aggregate(). |
SELECT: JOIN clause
SQL | INNER JOIN, LEFT/RIGHT/FULL outer joins. |
Cassandra | Joins are unsupported. Applications will have to model the data to avoid joins or do the joins in the application layer. |
Cosmosdb | Only self JOINs. No INNER/LEFT/RIGHT/etc joins. |
Couchbase | Supports INNER, LEFT OUTER, NEST, UNNEST and limited RIGHT outer. Same syntax as SQL. FULL OUTER join is unsupported. |
MongoDB | Limited LEFT OUTER JOIN only via $lookup operator. No join on array elements or expressions. |
SELECT: Aggregation
SQL | Aggregation |
Cassandra | Simple aggregation on the whole result is supported. Aggregation with GROUP BY is unsupported. |
CosmosDB | Simple aggregation on the whole result is supported. Aggregation with GROUP BY is unsupported. |
Couchbase | SUM, AVG, COUNT, MAX, MIN, VARIANCE: same as SQL |
MongoDB | $sum, $count, $avg with grouping support |
SELECT: aggregate functions
SQL | Simple aggregation on the whole result is supported. Aggregation with GROUP BY is unsupported. |
Cassandra | Unsupported |
Cosmosdb | Unsupported |
Couchbase | Supports SQL Standard window analytical functions in 6.5. |
MongoDB | Unsupported |
SELECT : Window (analytics/aggregation) functions
SQL | Window functions for running totals using the OVER() clause |
Cassandra | Unsupported |
CosmosDB | Unsupported |
Couchbase | Supports SQL Standard window analytical functions in 6.5.
See details: https://www.couchbase.com/blog/json-to-insights-fast-and-easy/ https://www.couchbase.com/blog/get-a-bigger-picture-with-n1ql-window-functions-and-cte/ |
MongoDB | Unsupported |
INSERT: Single row/document insert.
SQL | Insert a single row |
Cassandra | INSERT statement |
CosmosDB | API Insert |
Couchbase | INSERT statement |
MongoDB | db.t1.save() |
INSERT: Multiple row/document insert.
SQL | INSERT statement |
Cassandra | Unsupported |
Cosmosdb | Unsupported |
Couchbase | INSERT with Multiple documents |
MongoDB | db.t1.insert() |
DELETE Statement
SQL | Delete one or more documents |
Cassandra | DELETE statement; Same as SQL |
Cosmosdb | API delete |
Couchbase | DELETE statement; Same as SQL |
MongoDB | db.t1.delete() |
UPSERT statement
SQL | INSERT. UPDATE if exists. |
Cassandra | Unsupported |
Cosmosdb | Unsupported |
Couchbase | UPSERT statement. |
MongoDB | Unsupported |
UPDATE Statement
SQL | |
Cassandra | UPDATE; Same as SQL |
CosmosDB | API update |
Couchbase | UPDATE; Same as SQL |
MongoDB | db.t1.update() |
MERGE: Merge one relation (set of rows) to another.
SQL | Merge a set of rows (documents) into another. |
Cassandra | Unsupported |
CosmosDB | Unsupported |
Couchbase | MERGE statement, same as SQL. |
MongoDB | Unsupported |
PREPARE statement
SQL | Parse, analyze and create an execution plan. |
Cassandra | Supported. I see excamples of preparedStatement() in Java SDK.dd |
CosmosDB | Unsupported |
Couchbase | Supported; PREPARE |
MongoDB | Unsupported |
EXECUTE
SQL | Execute an ad-hoc or prepared statement. |
Cassandra | Supported in Java. |
Cosmosdb | Unsupported |
Couchbase | Supported, similar to SQL. |
MongoDB | Unsupported |
GRANT/REVOKE
SQL | Grant/REVOKE permissions for specific operation on the data set |
Cassandra | GRANT, REVOKE |
CosmosDB | API support |
Couchbase | GRANT ROLE, REVOKE ROLE |
MongoDB | TBD?? |
DESCRIBE statement
SQL | Describes the schema of a table |
Cassandra | DESCRIBE |
Cosmosdb | Unsupported |
Couchbase | INFER describes the schema of the documents |
MongoDB | Compass tool — graphical only. |
TRUNCATE statement
SQL | Truncates the data in the table without altering security or physical schema. |
Cassandra | TRUNCATE |
CosmosDB | Unsupported |
Couchbase | FLUSH operation |
MongoDB | Unsupported. Workaround via remove collection, recreate with the same security settings. |
Value Logic (boolean values)
SQL | True, False, NULL (Unknown) |
Cassandra | True, False, NULL (Unknown) |
Cosmosdb | True, False, NULL (Unknown) |
Couchbase | True, False, NULL (Unknown), Missing
https://docs.couchbase.com/server/4.0/n1ql/n1ql-language-reference/booleanlogic.html |
MongoDB | True, False, NULL (Unknown) |
Query Optimizer: Type of Optimizer
SQL | Rule-based and cost-based optimizer. Does query rewrites, index selection, join ordering, join type selection and position of the tablers (inner/outer, hash-table build/probe) |
Cassandra | Rule-based optimizer. Index selection for the single table is done since there are no joins. |
CosmosDB | Rule-based optimizer mainly does index selection. |
Couchbase | Rule-based optimizer, index(es) selection. Block Nested Loop join by default but supports hash join via user hint in the query. |
MongoDB | “Shape-based” optimizer according to the docs. Each new query is matched with a query based on the “shape”. The first time a query is run, the optimizer does index selection, but when there are multiple candidates, it’ll run multiple queries concurrently to see who returns the results first. |
Query Optimizer: Index Selection
SQL | Yes |
Cassandra | Yes |
CosmosDB | Yes |
Couchbase | Yes |
MongoDB | Yes |
Query Optimizer: Query Rewrite
SQL | Rewrite parts of the query to logical equivalent to better performance. E.g. Subquery rewrite, view folding, join type conversion, constant expression evaluation, etc. |
Cassandra | None |
Cosmosdb | No known rewrites |
Couchbase | Basic query rewrite. LEFT OUTER to INNER when applicable, constant expression evaluation. |
MongoDB | None |
Query Optimization: JOIN Type
SQL | Choose the most efficient index, from the available join types. |
Cassandra | Not applicable since joins are unsupported. |
CosmosDB | Not clearly documented. |
Couchbase | Nested loop by default. Hash join by user hint. |
MongoDB | Just the nested loop is supported. |
TRANSACTION support.
SQL | ACID support with multi-row and multi-statement support. |
Cassandra | No |
Cosmosdb | Yes |
Couchbase | No |
MongoDB | Yes, in 4.0 |
Indexes
SQL | Data structures maintained to speed up the query performance. Indexes are |
Cassandra | Supports primary, secondary, array-indexes. Need to install and index the data on SOLR for a search index. |
CosmosDB | Indexes everything by default: scalars, arrays. No support for the search index. |
Couchbase | Supports primary, secondary, composite, functional, adaptive, search, spatial, partitioned and replica index. Indexes are eventually consistent. |
MongoDB | Supports primary, secondary, composite, search, spatial, partitioned and replica index. The search index is simplistically created on a B-Tree. |
SQL: Datatype support.
SQL | Extensive numerical, character, date-time data type support. |
Cassandra | Numeric, decimal, double. Int, float, varint, Timestamp, collection (set, list) |
CosmosDB | JSON data types: numeric, string, boolean, object, arrays |
Couchbase | JSON data types: numeric, string, boolean, object, arrays |
MongoDB | JSON data types: numeric, string, boolean, object, arrays and custom extensions for the timestamp datatype. |
Conclusion:
The popular NoSQL databases have tried to extend and support the declarative SQL for the respective data model and architectures. So, it’s important to understand the capabilities, limitations of the features and architecture during the evaluation.
References:
[…] transactions and their semantics. SQL as a query language has been unreasonably effective even in NoSQL database systems. However, few NoSQL database systems support transactions. The ones that support come with a long […]