Recursive Common Table Expressions (CTEs) and Oracle’s CONNECT BY are well-known SQL constructs among RDBMS users, facilitating the delegation of complex, interdependent data structure exploration to the database layer for enhanced processing efficiency.
These constructs are crucial for querying interdependent data structures, a common requirement across various industries, including finance, supply chain management, customer relationship management (CRM), travel booking, and more recently, social networks. Recognizing their importance, all major relational database management systems (RDBMS) such as PostgreSQL, MySQL (from version 8.0), SQL Server, Oracle, and SQLite offer support for Recursive CTEs.
In contrast, NoSQL databases, which are designed to manage a diverse array of data models like documents, key-value, wide-column, and graph data, prioritize scalability, high availability, flexibility, and performance in distributed systems. In these environments, the CTE concept, recursive or not, isn’t directly addressed. Users often turn to specialized solutions such as graph databases—Cypher for Neo4J and AQL for ArangoDB, for instance—to handle complex data structures.
Couchbase sets itself apart with SQL for JSON, offering a unique approach to Recursive CTEs that also extend its multi model support. This blog will delve into three primary topics:
-
- How you can leverage a single DBMS as Couchbase for complex data structures for a number of use cases, in the same way as RDBMS could, but without the need for a more dedicated database.
- The use of Couchase SQL++ construct to query, transform and project these complex relationship using a SQL construct that are familiar to RDBMS users.
- Best practices to manage resource consumption with Recursive CTE.
Bill of Materials use case
The BOM is a critical component in manufacturing and engineering, detailing the raw materials, parts, and components needed to manufacture a product. It often has a hierarchical structure, where parts are made up of other parts or materials.
This example will include basic components and sub-components of a desktop computer.
Component ID | ComponentName | ParentComponentID | Quantity |
1 | Desktop Computer | null | 1 |
2 | Motherboard | 1 | 1 |
3 | CPU | 2 | 1 |
3 | CPU fan | 3 | 1 |
4 | GPU | 2 | 1 |
5 | RAM | 2 | 4 |
6 | M.2 drive | 2 | 1 |
7 | SSD | 2 | 1 |
8 | Power Supply | 1 | 1 |
9 | Case | 1 | 1 |
10 | Case cooling fans | 1 | 4 |
Let’s say we want to list all parts and sub-parts needed to build a desktop computer, along with the quantities required for each.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
WITH RECURSIVE ComponentHierarchy AS ( SELECT ComponentID, ComponentName, ParentComponentID, Quantity, 1 AS Lvl -- Depth level in the hierarchy FROM components WHERE ParentComponentID IS NULL -- Starting point: the Desktop Computer itself UNION ALL SELECT c.ComponentID, c.ComponentName, c.ParentComponentID, ch.Quantity * c.Quantity AS Quantity, -- Calculate total quantity required at each level ch.lvl + 1 -- Increment level for each recursion step FROM components c JOIN ComponentHierarchy ch ON c.ParentComponentID = ch.ComponentID ) SELECT * FROM ComponentHierarchy; |
This query initializes the recursion with the top-level item (the Desktop Computer) and recursively joins the components table with itself to traverse down the hierarchy, adjusting quantities as needed for each component and sub-component.
The result will list all parts needed for the Desktop Computer, including the quantity of each and their hierarchical level, which can be useful for understanding the assembly structure or for inventory and ordering purposes.
Explanation
The CTE starts with the Bicycle (where ParentPartID is NULL) and then recursively finds all components and sub-components.
The Quantity is adjusted at each level to reflect the total number required for one Bicycle.
The Level column, though not strictly necessary for all BOM analyses, helps understand the depth of each part within the hierarchy.
This approach allows for a detailed breakdown of all materials and components required for manufacturing a product, essential for inventory management, cost estimation, and production planning in manufacturing operations.
Social Networking use case
A common application in a social networking use case is to find the degrees of connection between two users—essentially, how users are connected through a chain of mutual friends. Let’s say we need to determine the shortest path (in terms of degrees of connection) between two users in a social network. This can help in features like suggesting friendships or understanding network dynamics.
Consider we have the following users and how they are friends with each other, eg. Alice is friends with Bob, and also with Charlie. But Alice is not friends with Dana.
In this expanded network, users are connected in various ways, creating multiple paths through which users can be connected.
Let’s find the degrees of connection between Alice[1] and Frank[6], including the names of users along the path.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
WITH RECURSIVE ConnectionPath AS ( SELECT u1.UserID, u1.UserName AS StartUser, u2.UserID AS FriendID, u2.UserName AS FriendName, 1 AS Degree FROM user_connections uc JOIN users u1 ON uc.UserID = u1.UserID JOIN users u2 ON uc.FriendID = u2.UserID WHERE u1.UserID = 1 -- Starting user (Alice) UNION ALL SELECT cp.UserID, cp.StartUser, u.UserID AS FriendID, u.UserName AS FriendName, cp.Degree + 1 FROM ConnectionPath cp JOIN user_connections uc ON cp.FriendID = uc.UserID JOIN users u ON uc.FriendID = u.UserID WHERE uc.FriendID NOT IN (cp.UserID) -- Avoid loops by not revisiting the start user ) SELECT * FROM ConnectionPath WHERE FriendID = 6 -- Target user (Frank) ORDER BY Degree ASC; |
Explanation
-
- Base Case: The CTE starts by identifying direct connections of Alice (UserID 1), including user names for readability.
- Recursive Step: It then recursively finds friends of those connections, extending the search and tracking the degree of connection. The JOINs ensure that user names are included for both the starting user and their friends at each step.
- Termination and Filtering: The recursion continues until it finds connections to Frank (UserID 6). The query filters for paths leading to Frank and orders the results by the degree of connection to identify the shortest paths.
This query demonstrates how to trace and enumerate paths through a social network, including user names for clarity. It provides a foundation for more complex analyses, such as identifying all mutual connections or exploring network structures.
Graph network traversal use case
For this use case, I will use a summarized version of the route data from America Airlines. Note that this is not from the Couchbase travel-sample. In this example we use Couchbase SQL++ Recursive CTE query to find all flights from LAX to MAD with < 2 stops from this sample dataset. Note that this sample data is not based on the travel-sample
, but a simplified version of the AA routes for 2008.
source_airport_code | destination_airport_code | airline |
LAX | MAD | AA |
LAX | LHR | AA |
LHR | MAD | AA |
LAX | OPO | AA |
OPO | MAD | AA |
MAD | OPO | AA |
SQL++ Query | Results |
/* List all routes from LAX to MAD with < 2 stops */ WITH RECURSIVE RouteCTE AS ( SELECT [r.source_airport_code, r.destination_airport_code] AS route, OPTIONS {“levels”:3} |
[ { “route”: [ “LAX”, “MAD” ] }, { “route”: [ “LAX”, “LHR”, “MAD” ] }, { “route”: [ “LAX”, “OPO”, “MAD” ] } ] |
Explanation
-
- The routeCTE starts with all flights departing from LAX.
- The recursive part of the routeCTE looks for flights connecting from the lastStop in the current route back to other airports, avoiding routes that loop back to LAX.
- The route array accumulates the sequence of airport codes to show the path taken.
- The query outputs all routes that end in MAD, detailing the paths found.
Best Practices Recursive CTE
When using Recursive Common Table Expressions (CTEs) in SQL++, developers should be aware of the implications of the recursive nature and the cost for query processing. Here are the best practices:
Set limits for the recursion depth – Always set a limit on the recursion depth to prevent infinite loops and excessive resource consumption. Use a counter or a condition within the recursive CTE to control how deep the recursion can go, include the options limits.
Monitor Performance – Recursive CTEs can be resource-intensive. Monitor performance closely, especially for long or complex queries, and optimize them as necessary. This might involve indexing,or breaking down overly complex CTEs.
Avoid Unnecessary Complexity – Keep the logic within the recursive part of the CTE as simple as possible. Overly complex conditions or computations can significantly degrade performance. Check the JOIN condition for correctness.
Ensure Correct Data Structure – Verify that your data is structured correctly for recursion. Incorrect or malformed data can lead to incorrect results or inefficient queries.
Test Extensively – Thoroughly test recursive CTEs with various datasets, including edge cases. This helps in catching any issues with infinite loops, incorrect results, or performance bottlenecks.
Set the memory quota – Set the memory quota either at the request or node level to avoid excessive of memory usage in the recursive query.
Limitations
Recursive CTEs are a powerful feature in Couchbase SQL++, which are commonly found in other RDBMS. They allow for the execution of complex queries, such as traversing hierarchical and graph network data or performing iterative calculations that are difficult to express with standard SQL. However, there are limitations and considerations to be aware of when using recursive CTEs. These limitations often pertain to performance, syntax restrictions, and the complexity of queries. Here are some specifics:
Aggregates: Recursive CTEs typically do not allow aggregate functions (MIN(), MAX(), SUM(), AVG(), etc.) or DISTINCT within the recursive part of the CTE. These operations do not make sense in the context of adding rows recursively because they imply a final result set after all recursion has been resolved.
Window Functions: Like aggregate functions, window functions (ROW_NUMBER(), RANK(), etc.) are generally not used within the recursive part of the CTE. They are intended for use on a set of rows returned by the query, making them suitable for the non-recursive term or in a query selecting from the recursive CTE.
LIMIT / ORDER BY: These clauses are not allowed within the recursive member of the CTE. The reasoning is that they pertain to the final result set ordering and does not make sense within the context of building the recursive set, where intermediate results are cumulatively constructed through each iteration.
Next Steps
-
- Learn more about Couchbase SQL++
- More CTE examples in blog: Recursive Query Processing in SQL++ (N1QL)
- Get started with a free trial of Couchbase Capella