You may know Couchbase as the fastest, most scalable NoSQL platform available. However, did you know that Couchbase can also be a relational database? Couchbase is a data platform that can operate with multiple simultaneous personalities all in the same platform. It can be structured or unstructured. And it can be transactional or analytical. In this blog, we are going to look at moving data from Oracle to Couchbase.
Source Oracle Database
For a source database, we are going to use Swingbench. It is a synthetic workload generator for Oracle. It is self-contained and creates schemas that are reminiscent of what might be found in an actual production application. For this blog, we are going to use the Simple Order Entry (SOE) schema that is created by the utility. A visual representation of the SOE schema can be seen below. Generating the schema with a scale parameter of 1 results in a dataset with about 15 million records.
Data Import
To migrate the data from Oracle to Couchbase, we will use the cbperf demo tool. This utility primarily generates randomized synthetic schemas for Couchbase, but it also can import data into Couchbase via a plugin from external data sources. The Oracle plugin for cbperf will replicate all tables available to the connecting user, copy associated table data, and for each table, it will extract the columns that are indexed to recreate the indexes on Couchbase.Â
1 |
% bin/cb_perf import -h cbhost -b soe -s soe -P oracle -V connect=soe/soe@orahost/testdb |
Assuming that the SOE schema was set up with the default username and password, which is soe and soe, respectively, this syntax will export the data from the database SID testdb on host orahost and import it to the Couchbase bucket soe with scope soe where one node in the cluster is cbhost.
The Data in Couchbase
Let’s get the important disclaimer out of the way up front. For a production application, you will likely not want to just “lift and shift” the data and indexes. Instead, an analysis needs to be done to determine the optimal configuration in Couchbase along with the required application changes. The ultimate level of effort will vary by application.
That said, once the cbperf import is complete, there will be about 15 million documents in Couchbase. The soe scope will have collections that correspond to each table in Oracle.
Here are the tables in Oracle:
Here are the collections in Couchbase:
Each row that was in the Oracle database is now a document in Couchbase. The document key is constructed from the collection name plus a colon “:” plus the sequential document number (basically the row number). The document JSON keys are the table column names, and the values are the column values for that row.
1 |
select * from CUSTOMERS where CUSTOMER_ID = 1 ; |
1 |
select * from soe.soe.customers where customer_id = 1 ; |
Finally, we can see all the indexes that were automatically generated based on the indexed columns in the Oracle database. To allow for the most query flexibility, each index covers one column.Â
Running a Query
Looking through the Swingbench source code, I found the following query:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
SELECT tt.order_total,   tt.sales_rep_id,   tt.order_date,   customers.cust_first_name,   customers.cust_last_name FROM   (SELECT orders.order_total,     orders.sales_rep_id,     orders.order_date,     orders.customer_id,     rank() Over (Order By orders.order_total DESC) sal_rank   FROM orders   WHERE orders.sales_rep_id = 1   ) tt,   customers WHERE tt.sal_rank <= 10 and customers.customer_id = tt.customer_id |
Running this query against the Oracle database, I get the following results:
And running the query on Couchbase, I get the same results. The only thing I had to change in the query was to prefix the collections with soe.soe. to construct a Couchbase key space.
Fun Fact
You may be familiar with the ORACLE_SID as the Oracle database instance identifier, but do you know what it does under the covers? It stands for the Site Identifier. It is hashed with ORACLE_HOME to get an SGA (System Global Area) key which is the shared memory key on Unix/Linux systems. When you get an ORA-01034: ORACLE not available error, it is because this shared memory key is not present on the system. Thus you can’t communicate with the database. Fortunately, working with Couchbase is vastly simpler, allowing you to focus on other things. Unless, of course, you like operating system internals.
Conclusion
While it might seem like a daunting task, importing data from a relational database into Couchbase is not that difficult. To be fair, there is more to consider than just columns and indexes, such as stored procedures (user-defined functions in Couchbase), but it is certainly very doable.