Your Guide to User-Defined Functions in Couchbase Analytics

User-defined functions put you in the driver’s seat when it comes to querying and analyzing your data.

In the Couchbase ecosystem, user-defined functions (UDFs) are reusable and parameterizable SQL++ queries that enable you to modularize queries and increase code reuse – all while giving you more control over how a particular query interacts with your business logic.

That said, I’m excited to announce that the Couchbase Server 7.0 release now supports User-Defined Functions (UDFs) in the Analytics Service.

Check out yesterday’s blog post for more information on user-defined functions in the SQL++ query language.

User-Defined Functions 101

Within Couchbase Analytics, user-defined functions have two parts:

The Function Signature: The signature consists of the function name plus its number of parameters. A function takes zero, one, or more parameters. Each function belongs to a dataverse (or Scope in 7.0 terminology) and must have a unique signature within that dataverse. This means you could create two functions within the same dataverse that have the same name but have a different number of parameters. Because the function signature is distinct (name + number of parameters), it’s allowed.
Function Body: A function body can be either an expression or a subquery and can refer to existing datasets or to other functions within the same dataverse or in a different dataverse.

Here are two examples showing the function signature – my_dataverse.GetOrders(...) – and the function body (everything between the curly braces below):

CREATE OR REPLACE ANALYTICS FUNCTION my_dataverse.GetOrders(customer_id) { 
	SELECT o
	FROM   Orders AS o
	WHERE  o.customer_id = customer_id
}

CREATE OR REPLACE ANALYTICS FUNCTION my_dataverse.GetOrders(customer_id) {

SELECT o

FROM Orders AS o

WHERE o.customer_id = customer_id

}

CREATE OR REPLACE ANALYTICS FUNCTION my_dataverse.GetOrders(customer_id, year) {
	SELECT o
	FROM   Orders AS o
	WHERE  o.customer_id = customer_id
	AND	o.year = year
}

CREATE OR REPLACE ANALYTICS FUNCTION my_dataverse.GetOrders(customer_id, year) {

SELECT o

FROM Orders AS o

WHERE o.customer_id = customer_id

AND o.year = year

}

If the CREATE ANALYTICS FUNCTION statement doesn’t provide a dataverse name, then the function is created within the active dataverse. If a dataverse is not specified, then the function is created in the “Default” dataverse.

A dataverse can be provided via a USE <<dataverse>> statement in the query or via a query_context REST API parameter.

Background: Scopes & Collections in Couchbase Analytics

Before I show you the full lifecycle of a Couchbase Analytics UDF, let’s first catch up on some important context.

Diagram A below represents sample data that could be used by a travel business to manage airline and hotel bookings.

An architecture diagram of how Couchbase Analytics parallel the Bucket, Scopes and Collections of the Data Service

Diagram A

In this architecture, we have a Bucket named travel (similar to a database in a relational database), which contains a Scope called inventory (similar to a schema in an RDBMS), and four Collections named airline, airport, hotel and route (similar to tables in an RDBMS). For each of these Collections, we have four corresponding Analytics Collections (a.k.a. datasets) and organize them within the travel.inventory Scope (a.k.a. dataverse).

A UDF Example Using Couchbase Analytics

For this example, let’s assume your data analytics team needs to frequently query which airlines within a given country fly the most routes and what their percentile ranks are.

Here’s how UDFs help out your team when completing these complex queries.

How to Manage Your UDFs

Your first step is to create the user-defined function in the Analytics Scope (or dataverse) designated as travel.inventory.

Below is the Data Definition Language (DDL) statement to create a UDF that makes your query more succinct:

CREATE OR REPLACE ANALYTICS FUNCTION travel.inventory.getAirlineRank(in_country) {
	SELECT a.id,
       	a.name AS airline,
       	count(r.id) AS routecount,
       	PERCENT_RANK() OVER (ORDER BY count(r.id)) AS `rank`
	FROM   travel.inventory.airline a
	JOIN   travel.inventory.route r ON a.id = r.airlineid
	WHERE  a.country = in_country
	GROUP BY a.id, a.name
}

CREATE OR REPLACE ANALYTICS FUNCTION travel.inventory.getAirlineRank(in_country) {

SELECT a.id,

a.name AS airline,

count(r.id) AS routecount,

PERCENT_RANK() OVER (ORDER BY count(r.id)) AS `rank`

FROM travel.inventory.airline a

JOIN travel.inventory.route r ON a.id = r.airlineid

WHERE a.country = in_country

GROUP BY a.id, a.name

}

The UDF above takes the parameter in_country as input for the country name and uses it as a filter for the query. Then, the query performs a JOIN between the airline and route Collections and performs aggregations for each airline in order to:

Count the number of airline routes
Calculate the percentile rank of route counts using the PERCENT_RANK function with the OVER clause.

(The OVER clause retrieves a specific set of rows relative to the current row and performs an aggregation over the id field. Learn more about window functions in this blog post.)

How to Evaluate Your UDFs

Next, let’s try out your brand-new UDF.

The following query determines the top three airlines for a given country (“United States” in this example).

SELECT ar.airline as Airline,
   	ROUND(ar.rank*100,2) as Rank,
   	ar.routecount as RouteCount
FROM   travel.inventory.getAirlineRank("United States") ar
ORDER  BY ar.route_count desc
LIMIT 3;

SELECT ar.airline as Airline,

ROUND(ar.rank*100,2) as Rank,

ar.routecount as RouteCount

FROM travel.inventory.getAirlineRank("United States") ar

ORDER BY ar.route_count desc

LIMIT 3;

The result of evaluating this query shows the top three airlines flying with most routes along with their percentile ranks:

[
  {
	"Rank": 100,
	"RouteCount": 2354,
	"Airline": "AMERICAN"
  },
  {
	"Rank": 96.97,
	"RouteCount": 2180,
	"Airline": "UNITED"
  },
  {
	"Rank": 93.94,
	"RouteCount": 1981,
	"Airline": "DELTA"
  }
]

[

{

"Rank": 100,

"RouteCount": 2354,

"Airline": "AMERICAN"

{

"Rank": 96.97,

"RouteCount": 2180,

"Airline": "UNITED"

{

"Rank": 93.94,

"RouteCount": 1981,

"Airline": "DELTA"

}

]

How to Drop Your UDFs

Once you no longer need your UDF, you can drop it using this DDL statement below:

DROP ANALYTICS FUNCTION travel.inventory.getAirlineRank(in_country)

1	DROP ANALYTICS FUNCTION travel.inventory.getAirlineRank(in_country)

Easy, right?

Conclusion

Reusable user-defined functions are helpful to streamline and modularize your code while also providing more flexibility for your system. I hope you find this new UDF functionality useful for your work with Couchbase Analytics.

I’d love to hear your thoughts and comments below on how you like the new UDF feature and how you’re using it. Or, get a conversation started on the Couchbase Forums. I look forward to hearing from you.

Take your own UDFs for a spin with Couchbase Analytics:
Test drive Couchbase 7 today

Idris Motiwala

Platform

Self-Managed

Services

Capabilities

Why Couchbase?

Migrate to Capella

By Use Case

By Industry

By Application Need

Popular Docs

By Developer Role

COMMUNITY

Join the Developer Community

Resource Center

Education

Compare

About

Partnerships

Our Services

Partners: Register a Deal

Ready to register a deal with Couchbase?

Marriott

All Posts