SQL++ queries can access data stored in your Couchbase cluster in a variety of ways. There are situations where having the business logic as part of your data queries can also be beneficial. SQL++ supports this with User Defined Functions (UDFs) that have been available since Couchbase 7.0.
In this blog post, we create a UDF in JavaScript that queries points from a user’s location, dynamically, using SQL++. Additionally, we perform the same distance calculations with a UDF in Python within the Analytics service.
Our geospatial query use case
Our application will generate geographic points of interest from our database that are near a user’s GPS location, similar to services like Google Maps, shown in the screenshot below. For this example, we will use the travel-sample dataset that is available in a sample bucket provided by Couchbase.
In particular, we are interested in seeing the landmarks and airports around the user’s current location. This cannot be achieved using a SQL++ query directly as the distance calculation is based on the real-time geographic location of the user. SQL++ supports defining UDFs in JavaScript to perform custom logic in the queries.
Calculating distances from GPS coordinates
There are many ways to calculate the distance between two sets of GPS coordinates. In this example, we will calculate the distance using the Haversine Formula. It gives the approximate distance between two GPS coordinates by considering the path to be a sphere rather than a straight line distance.
JavaScript code to calculate geographic distances is shown in this sample:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
function degreesToRadians(degrees) { return degrees * Math.PI / 180; } function distanceInKmBetweenEarthCoordinates(lat1, lon1, lat2, lon2) { var earthRadiusKm = 6371; var dLat = degreesToRadians(lat2-lat1); var dLon = degreesToRadians(lon2-lon1); lat1 = degreesToRadians(lat1); lat2 = degreesToRadians(lat2); var a = Math.sin(dLat/2) * Math.sin(dLat/2) + Math.sin(dLon/2) * Math.sin(dLon/2) * Math.cos(lat1) * Math.cos(lat2); var c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a)); return earthRadiusKm * c; } |
We define two JavaScript functions–one that performs the conversion between degrees and radians and another that calculates the distance in kilometers between the GPS coordinates from the other function.
Importing UDFs into Couchbase
These JavaScript functions can now be imported into Couchbase using the REST API, as shown below with the curl command:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
curl -v -X POST http://localhost:8093/evaluator/v1/libraries/math -u <user>:<password> -d 'function degreesToRadians(degrees) { return degrees * Math.PI / 180; } function distanceInKmBetweenEarthCoordinates(lat1, lon1, lat2, lon2) { var earthRadiusKm = 6371; var dLat = degreesToRadians(lat2-lat1); var dLon = degreesToRadians(lon2-lon1); lat1 = degreesToRadians(lat1); lat2 = degreesToRadians(lat2); var a = Math.sin(dLat/2) * Math.sin(dLat/2) + Math.sin(dLon/2) * Math.sin(dLon/2) * Math.cos(lat1) * Math.cos(lat2); var c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a)); return earthRadiusKm * c; }' |
After this step, the UDF can be defined in the web console’s Query Editor:
1 2 3 |
CREATE FUNCTION degreesToRadians(a) LANGUAGE JAVASCRIPT AS "degreesToRadians" AT "math" CREATE FUNCTION distanceInKmBetweenEarthCoordinates(lat1, lon1, lat2, lon2) LANGUAGE JAVASCRIPT AS "distanceInKmBetweenEarthCoordinates" AT "math" |
Here, the math reference is to the JavaScript library that we created to evaluate the UDF.
The UDF can now be tested with sample GPS coordinates in SQL++ using the Execute Function as shown below
1 |
EXECUTE FUNCTION distanceInKmBetweenEarthCoordinates(51.5, 0, 38.8, -77.1) |
We can observe that the function works as intended when we provide the GPS coordinates manually to the function.
Connecting the UDF to Couchbase data
In the travel-sample dataset, we have GPS coordinates of the landmarks and airports along with other places of interest like hotels.
We can integrate them into our queries like the one below:
1 2 3 4 5 6 |
SELECT distanceInKmBetweenEarthCoordinates(a.geo.lat, a.geo.lon, 51.509865, -0.118092) AS distance, a.airportname, a.city FROM `travel-sample`.inventory.airport a ORDER BY distance ASC LIMIT 10; |
This query returns a list of the ten closest airports to the user’s location (51.509865, -0.118092). We provide the latitude (a.geo.lat) and longitude (a.geo.lon) fields that are embedded in the documents using the power of SQL++.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
[ { "airportname": "All Airports", "city": "London", "distance": 0.6998675034052988 }, { "airportname": "Waterloo International", "city": "London", "distance": 0.7880158040048914 }, { "airportname": "London St Pancras", "city": "London", "distance": 2.289359875405007 }, { "airportname": "Euston Station", "city": "London", "distance": 2.30782110865356 }, { "airportname": "St Pancras Railway Station", "city": "London", "distance": 2.582290289682563 }, { "airportname": "Paddington Station", "city": "London", "distance": 4.069442660124984 }, { "airportname": "London Heliport", "city": "London", "distance": 6.062824964656381 }, { "airportname": "Elstree", "city": "Elstree", "distance": 8.735152174563803 }, { "airportname": "City", "city": "London", "distance": 12.009592036043564 }, { "airportname": "London - Kings Cross", "city": "London", "distance": 16.891716659500467 } ] |
Similarly, we can calculate the points of interest around the user using the landmark collection in increasing distance from the user:
1 2 3 4 5 6 7 |
SELECT distanceInKmBetweenEarthCoordinates(l.geo.lat, l.geo.lon, 51.509865, -0.118092) AS distance, l.activity, l.city, l.content FROM `travel-sample`.inventory.landmark l ORDER BY distance ASC LIMIT 10; |
Results of the query showing nearby landmarks:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
[ { "activity": "see", "city": "London", "content": "Somerset House is home to three art galleries: The exceptional '''Courtauld Institute''' displays a collection of 19th and 20th century art, including well-known works by Degas, Matisse and Kandinsky; The Gilbert Collection presents a collection of decorative art; and The Hermitage Rooms, the most recent addition to Somerset House, hosts temporary exhibitions of works on loan from the Hermitage Museum in Saint Petersburg. The central courtyard is filled with fountains in the Summer, but in the Winter, an ice rink is installed, it is very popular, so visitors should book in advance, or expect to wait a long time to skate.", "distance": 0.10940067520415872 }, { "activity": "see", "city": "London", "content": "Cleopatra's Needle originated in the ancient [[Egypt]]ian city of [[Cairo/Heliopolis|Heliopolis]], in the Temple of Atum, but the Romans moved it to [[Alexandria]] in 12 BC. In 1819, viceroy Mehemet Ali presented Cleopatra's Needle to the British, commemorating military victories in Egypt, but it remained in Alexandria until 1877 when transportation was arranged to bring it to London. On the voyage, the ship capsized in a storm, killing six crewmembers. Cleopatra's Needle was thought to be lost, but Spanish trawlers found it afloat a few days later, and after some repairs, it arrived in London on 21 Jan 1878. The obelisk is flanked by two faux-sphinxes, which show the effects of bombings of London during World War II. Today, Cleopatra's Needle shows some wear from exposure to London's damp weather.", "distance": 0.2153782246329736 }, { "activity": "buy", "city": "London", "content": "Daily second-hand book sale near the bank of the Thames. A nice place to just browse for books (classic and modern), maps and prints.", "distance": 0.329776385402355 }, { "activity": "see", "city": "London", "content": "[[London]] (in [[London/Covent Garden|Covent Garden]])", "distance": 0.34889537479151833 }, { "activity": "see", "city": "London", "content": "London's most famous and popular skateboarding area, situated partly underneath Queen Elizabeth Hall along Queen's Walk and the Thames. Also popular with graffiti artists, BMXers and so forth.", "distance": 0.36487940944981834 }, { "activity": "see", "city": "London", "content": "Tucked under Waterloo Bridge, BFI Southbank, formerly known as the National Film Theatre, pitches itself as the home of film and has three screens showing classic (including silent), foreign language and less mainstream films. Recently redeveloped, it now has a new entrance on Theatre Alley, a shop dedicated to film, an interactive exhibition space and an excellent bar/restaurant at the back. Visitors can also access the Mediatheque - wind your way through the BFI's extensive film and TV archive for free. Cool bar and restaurant. Tickets are generally available on the spur of the moment.", "distance": 0.378692262237853 }, { "activity": "see", "city": "London", "content": "Music venue hosting daily performances. | image=Queen Elizabeth Hall.jpg", "distance": 0.3859430181613397 }, { "activity": "drink", "city": "London", "content": "The antidote to gay bars: a pub-like atmosphere and great music. | image=The Retro Bar.jpg", "distance": 0.39732030942983415 }, { "activity": "see", "city": "London", "content": "Three large auditoriums, the Olivier, the Lyttelton and the Cottesloe. The Olivier theatre is the largest with an open stage and a fan shaped auditorium around it. This ensures that all seats provide a good view. Most of the more 'popular' productions are put on here as the space provided is much larger than most theatres. The Lyttelton theatre is more traditional with a procenium arc with good views from most seats. The Cottesloe is a small studio theatre, seating around 400. Some tickets will be available on the day, either day seats (arrive before 09:30 and queue) or standby (arrive before 6PM and queue), or you can buy online. Popular shows, especially those around Christmas in the Olivier sell out months in advance. Tickets to The National Theatre are generally better value than most other theatres. There is also the '£10 Travelex' season in the summer that provides a large number (over 100,000 seats a year) at £10. Booking in advance is required for these. There is also an exhibition space in the Lyttelton theatre foyer that frequently holds popular photographic exhibitions. Free jazz is often played in the evening in the ground floor foyer. During summer there is a free daily outdoor festival of performance, music, comedy and cabaret known as Watch This Space. Deckchairs (and artificial grass) are provided to watch on. Free exhibitions. Backstage tours £5. (http://www.nationaltheatrelondon.com/tickets/)", "distance": 0.42625112040817054 }, { "activity": "drink", "city": "London", "content": "Free nightly music events. The best place to sample underground electro, indie, dub-step and more.", "distance": 0.4323026974543284 } ] |
If we run these queries for multiple users at the same time, we might run into performance issues as we are using the compute resources that are part of the Couchbase cluster.
In such scenarios, Couchbase Analytics could reduce the impact on your cluster. Couchbase Analytics is designed to efficiently run complex queries over many records. Complex queries can include large ad hoc join, set, aggregation, and grouping operations–any of which may result in long-running queries, high CPU usage, high memory consumption, or excessive network latency due to data fetching and cross node coordination.
User Defined Functions with Couchbase Analytics
Couchbase Analytics allows us to define and use User Defined Functions in Python but at the time of writing requires you to enable Developer Preview mode in Couchbase Server. This can be done with the following command:
1 2 |
$ /opt/couchbase/bin/couchbase-cli enable-developer-preview \ --enable -c localhost:8091 -u <username> -p <password> |
The next step is to create a Python package, locally, on the development environment with the Python UDF. In this case, the UDF is a method to calculate the distance between two GPS coordinates.
1 2 3 4 5 6 7 |
# distance.py from geopy import distance class distance_calculation: def calculate_distance(self, lat1, lon1, lat2, lon2) -> float: """Calculate Distance using geodesic distance""" return distance.distance((lat1, lon1), (lat2, lon2)).km |
Here, we calculate the geodesic distance (the shortest distance between points along a curved path) between the two GPS coordinates with the help of a library, geopy.
Packaging the UDF
To package the library, we use a package shiv that can package the code along with its requirements for any platform. Here, we are using Linux as Couchbase Server is running on a Linux environment within Docker.
1 |
shiv -o distance.pyz --site-packages . --platform manylinux2010_x86_64 --python-version 39 --only-binary=:all: geopy |
In order to upload this binary package with the UDF to Couchbase, we need to use the REST API for the Analytics service.
1 |
curl -X POST -u <username>:<password> -F "type=python" -F "data=@./distance.pyz" localhost:8095/analytics/library/Default/pylib |
This uploads the packaged UDF into the pylib library in the default scope (formerly dataverse) of the Analytics environment. We can now define the UDF in the Analytics Workbench.
This definition statement indicates that we are defining a UDF named distance_in_km that can be called from the Python function calculate_distance defined in class distance_calculation within the Python module distance.
1 2 |
CREATE ANALYTICS FUNCTION distance_in_km(lat1, lon1, lat2, lon2) AS "distance", "distance_calculation.calculate_distance" AT pylib; |
Now we can use the UDF in our Analytics queries the same way we used the UDF in our SQL++ queries.
1 2 3 4 5 6 7 8 |
SELECT distance_in_km(51.5, 0, 38.8, -77.1) AS "distance" RESULTS: [ { "distance": 5933.5299530300545 } ] |
Mapping Data from Data Service in Analytics Service
In order to query the data in the Data Service from the Analytics Service, we need to map the travel-sample data collections in Analytics that creates a real-time shadow copy of the data in the Data Service in Analytics. For this example, we need to map the collections with geographical data, namely the landmark, airport and hotel collections from the inventory scope in the travel-sample bucket.
Running the Analytics UDF against Couchbase Data
Now, we can run the same queries that we were running before in SQL++ but with the Analytics Service. Only the UDF name has changed. The rest of the interface is similar to what we had with the SQL++ queries. The results will also be similar to the earlier results.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
SELECT distance_in_km(a.geo.lat, a.geo.lon, 51.509865, -0.118092) AS distance, a.airportname, a.city FROM `travel-sample`.inventory.airport a ORDER BY distance ASC LIMIT 10; SELECT distance_in_km(l.geo.lat, l.geo.lon, 51.509865, -0.118092) AS distance, l.activity, l.city, l.content FROM `travel-sample`.inventory.landmark l ORDER BY distance ASC LIMIT 10; |
This approach is suitable when we want to run these queries without impacting the Data Service which is commonly used for our transactional data. The data is synced between the Data and the Analytics Service internally in real-time in Couchbase.
Summary
In this blog post, you’ve learned to create a User Defined Function (UDF) in JavaScript that calculates distances between two GPS coordinates. You saw how to import the UDF into Couchbase and then integrate it into a SQL++ query to power applications that could provide points of interest around a user. We also showcased how you can perform the same distance calculation in a Python-based UDF using the Analytics service to reduce impact on your transactional Couchbase cluster.
For further reading and reference, see the following resources: