M. David Allen is a full-stack software engineer and entrepreneur who for more than a decade has been working with just about every different programming language and different type of database system he could get his hands on. David has previously worked across many industries including finance, healthcare, and government, typically focusing on large-scale data integration challenges, transitioning applied research, and new technology innovation. He holds a masters degree from Virginia Commonwealth University but since leaving formal education remains a permanent student of technology.
Setup
For this tutorial, we’ll be using the beer-sample dataset that comes with Couchbase so it will be easy to follow along. If you don’t yet have Couchbase installed, you can get going very quickly by having docker and following the easy instructions on the couchbase docker image page to set up a quick instance of Couchbase for testing. Simply running the command docker run -d –name db -p 8091-8094:8091-8094 -p 11210:11210 Couchbase with Couchbase installed gets us started. After following a few setup prompts described on that page, we’re ready to roll with some data.
Use Cases – What is Paging, and why do it?
Let’s consider a very common needs; in our hypothetical front-end beer application, we want to show our users a table of beers and let them rate each one. The problem is that the database has over 5,000 beers in it. It’d be a waste to send the whole dataset to the browser every time, and would really slow down the loading of your web application’s page.
Much better would be to show the user only the first 10 beers. Then, by either clicking a “Next” button to go to the next page, or even better by using an infinite scroll plugin (such as ngInfiniteScroll for angular or react-infinite for react) the user can be shown progressively more of the records from the database.
Show me the Code!
1 2 3 4 5 6 7 8 9 |
SELECT name, category, abv from `beer-sample` WHERE brewery_id is not missing ORDER BY name OFFSET 0 LIMIT 10; |
This returns the data you’d expect:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
[ { "abv": 0, "category": "North American Lager", "name": "#17 Cream Ale" }, { "abv": 0, "name": "#40 Golden Lager" }, { "abv": 0, "category": "Other Style", "name": "#42 Cream Ale" }, /* (...) */ |
To get the second page of results, you only need to add the LIMIT to the OFFSET. For example, OFFSET 10 LIMIT 10 would be the second page. To generalize this, how many records should you skip over with OFFSET in order to get to the nth page? It will always be n * page size.
Using this simple math, we can generalize this to any scenario. What if we wanted to show 20 results at a time, and the user wanted to jump straight to the 11th page? No problem, it’s just LIMIT 20 OFFSET 20 * 11. Application endpoints then, whether they’re searching the database, or just listing beers by name, can use the exact same query logic each time. No matter how complex the query is, just adjust LIMIT and OFFSET and we’re all on the same page.
How does this work?
The magic here is the combination of three N1QL clauses: LIMIT, OFFSET, and ORDER BY. Let’s take a quick look at each, because they’re all needed to get the job done.
- LIMIT does just what it says. It limits how many records Couchbase will return, and effectively enforces our page size (we only want 10 beers at a time, never 5,000)
- OFFSET tells Couchbase how many records to jump over before it begins returning documents. If you’re familiar with SQL, it behaves just like SKIP in SQL. The OFFSET clause is how we move forward through the entire data set one chunk at a time.
- ORDER BY guarantees a certain order returned by the database. This is important because Couchbase doesn’t guarantee any particular order of the results unless you ask for it. In our previous query, the #17 Cream Ale appeared as the first result when ordering by name. If we didn’t use ORDER BY, we wouldn’t be able to predict where in the set the #17 Cream Ale would appear, and it might be possible it could appear on more than one page depending on how Couchbase executed the query!
Put simply, imagine Couchbase building a list of all of the beers ordered by name (ORDER BY). Pagination is just taking batches of 10 of them (LIMIT 10). Each page then starts by skipping through the entire ordered list to the starting point of the next page (OFFSET).
EXPLAIN yourself, Couchbase.
One final example; as with any N1QL query, you can gain a lot of insight into how it works by just putting “EXPLAIN” in front of it. This tells Couchbase to tell you what it’s planning to do.
1 2 3 4 5 6 7 8 9 |
EXPLAIN SELECT name, category, abv from `beer-sample` WHERE brewery_id is not missing ORDER BY name OFFSET 0 LIMIT 10; |
(Try this query for yourself!) The output is rather long, so we won’t paste it here, but let’s summarize what Couchbase is doing with this query.
- It first does a primary scan of the beer-sample bucket, and fetches documents.
- In parallel, it filters for:
- Records that have a brewery_id (this tell us the record is a beer, not a brewery document)
- It projects out the name, category, and abv properties (what we asked for)
- It then orders the results by name
- The offset is applied next, effectively skipping the first X records and not returning them at all.
- Finally, it applies the LIMIT by reading the next n records, and ignoring all those after it.
When are we done?
If we’re moving through our data one page at a time, we also need to know when to stop. There are two ways of approaching this; on one hand, if we keep advancing through the pages indefinitely, we’ll eventually run out of data. On the other hand, you could just pre-compute how many pages are present, and then iterate through them until you hit that number.
What happens if we ask for the 7th page of 1,000 results, knowing full well that there isn’t any such document set?
1 2 3 4 5 6 7 8 9 |
SELECT name, category, abv from `beer-sample` WHERE brewery_id is not missing ORDER BY name OFFSET 7 * 1000 LIMIT 1000; |
Well, never fear — if you go past the end, you’ll just get an empty array.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
{ "results": [], "metrics": { "elapsedTime": "465.446327ms", "executionTime": "465.384101ms", "resultCount": 0, "resultSize": 0, "sortCount": 5891 } } |
For the record, the empty array is also what you’ll get if you ask for nonsense, such as:
1 2 3 4 5 6 7 8 9 |
SELECT name, category, abv from `beer-sample` WHERE brewery_id is not missing ORDER BY name OFFSET -5 LIMIT -10; |
Wait….what?!?! This query is asking to start 5 records before the beginning of the output set, and return no more than -10 records. 5 records before the beginning is nothing. And no more than -10 records is nothing. So you get….nothing!
This behaviour is very convenient; it’s clear when you’re out of records, and it also plays nicely with a lot of other assumptions at different layers of software. If you’ve ever used REST APIs that supported paging, this is generally how they work. For example, you can easily imagine a REST API endpoint that looks like this: http://cool-app.com/beers?limit=10&page=6000. If you wrote such an endpoint in your back-end, you can easily see how that would generate a corresponding N1QL query, and even if the user gave you a funky page number, your endpoint would do the right thing.
But what if I want to know ahead of time how many pages there will be?
This is straightforward as well. If we know that each page has n records in it, and we know the total number of records, then we can divide the total by n, take the ceiling of that number, and that will tell us how many potential pages are in Couchbase.
1 2 3 |
SELECT CEIL(count(*) / 10) from `beer-sample` WHERE brewery_id is not missing |
This yields on my local database 590. Because there are exactly 5,891 beer documents, that means there will be 589 sets of 10, and the final page will only have 1 document on it. Because 5,891 / 10 = 589.1, we use the N1QL CEIL function to get an integer number of pages, since whatever front-end application isn’t going to display 0.1 of a page.
Keep in mind that we still need the WHERE clause here, to cut down the documents to only beer documents, excluding brewery and other documents. If the WHERE was missing, you’d get the wrong number for sure.
This post is part of the Couchbase Community Writing Program