What is PL/SQL?

PL/SQL is a procedural language designed specifically to embrace SQL statements within its syntax. It includes procedural language elements such as conditions and loops, and can handle exceptions (run-time errors).

PL/SQL is native to Oracle databases, and databases like IBM DB2, PostgreSQL, and MySQL support PL/SQL constructs through compatibility features.

What is a JavaScript UDF?

JavaScript UDF is Couchbase’s alternative to PL/SQL.

JavaScript UDF brings JavaScript’s general-purpose scripting flexibility to databases, allowing for dynamic and powerful operations across modern database systems and enhances flexibility in data querying, processing, and transformation.

Most modern databases like Couchbase, MongoDB, Snowflake, and Google BigQuery support JavaScript UDF.

The problem

A common problem seen by users migrating from Oracle to Couchbase is porting their PL/SQL scripts. Instead of supporting PL/SQL, Couchbase lets users construct user-defined functions in JavaScript (supported since 2021).

JavaScript UDFs allow easy, intuitive manipulation of variant and JSON data. Variant objects passed to a UDF are transformed to native JavaScript types and values.

The unintended consequence of this is that the majority of RDBMS that have been in existence for the last ten years have strongly encouraged developers to access the database using their procedural extensions to SQL (PL/pgSQL, PL/SQL), which support procedural constructs, integration with SQL, error handling, functions and procedures, triggers, and cursors, or at the very least, functions and procedures (like Sakila). For any attempt to move away from them, all of their scripts would need to be rewritten.

Rewriting code is often a tedious task, especially when dealing with PL/SQL scripts that have been written in the 2000s and maintained since then. These scripts can be complex, often extending to thousands of lines, which can be overwhelming for the average enterprise user.

Solution

The ideal approach would be to develop a whole new PL/SQL evaluator, but that would require an excessive amount of engineering hours, and for the same use case, we already have a modern, stable, and fast JSEvaluator; So why support another evaluator?

This makes the problem a perfect use-case to leverage the ongoing advances in AI and LLMs. And that’s what we have done here. We have used Generative AI models to automate the conversion of PL/SQL to JSUDF.

As of June 2024, models have a limited context window, which means longer PL/SQL get hit with the error:  

Note, that this is for GPT4.

So do we wait for AI to become more powerful and allow more tokens (like Moore’s Law but for the AI’s context-length-vs-precision)?

No, that’s where ANTLR, a parser generator tool, comes in. ANTLR is well-known to be used for Compiler and Interpreter Development. That way we can break the big script to smaller units that can be translated independently.

So are we now building a transpiler? Well, yes and no.

Stages in a transpiler:

    1. Lexical Analysis (Tokenization)
    2. Syntactic Analysis (Parsing)
    3. Semantic Analysis
    4. Intermediate Representation (IR) Generation
    5. Optimization (Optional)
    6. Target Code Generation

How the AI translator works

Above steps 1,2 are done using ANTLR. We use ANTLR’s Listener interface to grab individual Procedure/Function/Anonymous-block,  as they are independent blocks of code. In a case where the  Procedure/Function/Anonymous-block are themselves exceeding the context window, we translate at a statement level (where the LLM assumes the existence of use of variables/function calls that aren’t defined here but somewhere before).

Subsequently, steps 3, 4, 5, and 6 are left to the LLM (e.g., GPT), i.e., translating each PL/SQL block into a JavaScript function to the best of its ability that also preserves the operational semantics of the block and is syntactically accurate.

The results are surprisingly quite positive; the translation is 80-85% accurate.

Another benefit of the solution is that we reduce hallucination by focusing on one task at a time, resulting in more accurate translations.

To visualize:

automate the conversion of PL/SQL to JSUDF

How to use the tool

The executable expects the following command-line arguments:

-u: capella signin email
-p: capella signin password
-cpaddr: capella-url for chat-completions api
-orgid: organisation id in the chat-completions api path
-cbhost: node-ip: cbcluster node
-cbuser: cluster-user-name: cbcluster user, added through database-acess
-cbpassword: cluster-password: cbcluster password, added through database-access
-cbport: query-service tls port: usually 18093
filepath , i.e path to the PL/SQL script that has to be translated
output-> In the output directory, a file with the same name as the plsql file is generated with translated JavaScript Library code.

An example:

cat example1.sql

To briefly explain the above script:

    • An outer loop runs for 4 iterations, incrementing x by 1000, counter by 1
    • Inner loop runs for 4 iterations, incrementing x by 1, counter by 1

Running the translator on the test PL/SQL:

Output JSUDF:

cat output/example1.js

The translated script has a function nestedloop (name generated by LLM) that does exactly what the original anonymous PL/SQL block specifies.

Side note: for named functions/procedures translated JS functions will have the same name.  For anonymous blocks LLM uses a name it comes up with.

Known issues

PL/SQL and JS are 2 different languages, and the way they are supported in Oracle and Couchbase doesn’t allow for a clean direct mapping between the 2. 

Below are some limitations we discovered and the workarounds we have implemented for the same:

​​1. console.log is not supported

DBMS_OUTPUT.PUT builtin procedure and other 2 similar builtins, DBMS_OUTPUT.PUT_LINE and DBMS_OUTPUT.NEW_LINE are translated to console.log(), but console.log is a browser API and is not supported by Couchbase’s JavaScript evaluation implementation. This has been a frequent ask, considering the Couchbase eventing function does support print() statements but not in JavaScript UDFs.

Workaround:

Users are expected to create a logging bucket.

Logs are inserted as part of a document INSERT into default.default collection. The document would look something like:

User can lookup his logs by selecting on logging:

An example:

The original PL/SQL

Translated to JavaScript UDF

This is already implemented in the tool.


To view the log:

2. Cross Package function calls

Procedures/Functions listed in the package specification are Global, and can be used from other packages via «package_name».«public_procedure/function». But the same is not true for a JavaScript Library in Couchbase, as import-export constructs are not supported by Couchbase’s JavaScript evaluation implementation.

Workaround:

In case of an interlibrary function call «lib_name».«function»(), user is expected to have the referenced library «lib_name» already created; you can verify this via GET /evaluator/v1/libraries

The referenced function «function» also is expected to be created as a global UDF; this can be verified via GET /admin/functions_cache or select system:functions keyspace. This way we can access the function via SQL++/N1QL.

An example:

math_utils Package

show_sum Package

Translated code:

It is auto-handled by the program, with a warning that it should be verified by a human set of eyes!

3. Global Variables

PL/SQL supports package level and session level global variables, but global variables are not supported in JSUDF deliberately by design as this causes concern for memory leaks.

Suggested workaround requires manual tweaking of the generated translation. For example:


Any function  that modifies a global variable must accept it as an argument and return it to the caller.

increment_counter:

Any function that only reads a global can accept it as an argument.

show_globals:

Package to Library

This section shows an end-to-end package-to-library conversion using the tool.

Sample PL/SQL package:

Translation:

Code:

Let’s insert a new employee document

Create employee collection:

Insert an employee:

This errors out, and that’s ok we can fix it manually.

Reading the reason and exception: Cannot INSERT non-string key 1 of type value.intValue, Ah! the key is always expected to be a string, passing insert_employee(“1”, “joe”, “briggs”, 10000) would do the trick, but it is unintuitive to expect employee_id to be a string.

Let’s alter the generated code:

And recreate the UDF:

Trying to insert again:

Update an employee:

Shoot! There’s a goof up, employee 1 isn’t Joe, it’s Emily. 

Let’s update employee 1

View the employee:

Delete the employee:

Emily left.

Again an error with the generated code, looking at the reason and exception we can confirm that translated code encloses delete in a transaction, which wasn’t the case in the original.

For transactions, buckets need to have durability set, but this requires more than one data server, hence the error.

The fix here is to alter the code to remove the enclosing translation:

Now, all functions in the original PL/SQL work in Couchbase via JavaScript UDFs. Yes, the example is pretty trivial but you get the gist on how to go about using the tool to migrate your PL/SQL scripts with little manual supervision.

Remember the tool is supposed to take you 80% , the other 20% still needs to be done by you, but much better than writing all of that code yourself!

The future

This project is open source, so feel free to contribute. Some ideas that have been suggested:

    1. Critic AI that can criticize generated code to ensure manual intervention is not needed at all
    2. Currently the source code is code that just works; no thoughts for parallelism or code reuse were put to use.

And also include the limitations discussed earlier.

Resources

Finally, I’d like to thank Kamini Jagtiani for guiding me and Pierre Regazzoni for helping me test the conversion tool.

Author

Posted by Gaurav Jayaraj - Software Engineer

Gaurav Jayaraj is an intern in the Query team at Couchbase R&D. Gaurav is pursuing his Bachelors in Computer Science from PES University, Bangalore.

Leave a reply