[This blog was syndicated from http://blog.grallandco.com]
If you have to deal with a large number of documents when doing queries against a Couchbase cluster it is important to use pagination to get rows by page. You can find some information in the documentation in the chapter “Pagination“, but I want to go in more details and sample code in this article.
For this pagination example I will start by creating a simple view based on the beer-sample dataset, the view is used to find brewery by country:
if (doc.type == “brewery” && doc.country){
emit(doc.country);
}
}
This view list all the breweries by country, the index looks like:
Doc id | Key | Value |
---|---|---|
bersaglier | Argentina | null |
cervecera_jerome | Argentina | null |
brouwerij_nacional_balashi | Aruba | null |
australian_brewing_corporation | Australia | null |
carlton_and_united_breweries | Australia | null |
coopers_brewery | Australia | null |
foster_s_australia_ltd | Australia | null |
gold_coast_brewery | Australia | null |
lion_nathan_australia_hunter_street | Australia | null |
little_creatures_brewery | Australia | null |
malt_shovel_brewery | Australia | null |
matilda_bay_brewing | Australia | null |
… | … | … |
… | … | … |
… | … | … |
yellowstone_valley_brewing | United States | null |
yuengling_son_brewing | United States | null |
zea_rotisserie_and_brewery | United States | null |
fosters_tien_gang | Viet Nam | null |
hue_brewery | Viet Nam | null |
So now you want to navigate in this index with a page size of 5 rows.
Using skip / limit Parameters
The most simplistic approach is to use limit and skip parameters for example:
Page 1 : Â ?limit=5&skip0
Page 2: Â ?limit=5&skip=5
…
Page x: Â ?limit=5&skip(limit*(page-1))
You can obviously use any other parameters you need to do range or key queries (startkey/endkey, key, keys) and sort option (descending).
This is simple but not the most efficient way, since the query engine has to read all the rows that match the query, until the skip value is reached.
Some code sample in python that paginate using this view :
This application loops on all the pages until the end of the index.
As I said before this does not represent pagination best practices since the system must read all the values until the skip is reached. The following example shows a better way to deal with this.
Using startkey / startkey_docid parameters
- The startkey parameter will be the value of the key where the query should start to read (based on the last key of the “previous page”
- Since for a key for example “Germany” you may have one or more ids (documents) it is necessary to say to Couchbase query engine where to start, for this you need to use the startkey_docid parameter, and ignore this id since it is the last one of the previous page.
Row num | Doc id | Key | Value |
---|---|---|---|
Query for page 1 : ?limit=5 | |||
1 | bersaglier | Argentina | null |
2 | cervecera_jerome | Argentina | null |
3 | brouwerij_nacional_balashi | Aruba | null |
4 | australian_brewing_corporation | Australia | null |
5 | carlton_and_united_breweries | Australia | null |
Query for page 2: ?limit=5&startkey=”Australia”&startkey_docid=carlton_and_united_breweries&skip=1 | |||
6 | coopers_brewery | Australia | null |
7 | foster_s_australia_ltd | Australia | null |
8 | gold_coast_brewery | Australia | null |
9 | lion_nathan_australia_hunter_street | Australia | null |
10 | little_creatures_brewery | Australia | null |
Query for page 3 : limit=5&startkey=”Australia”&startkey_docid=little_creatures_brewery&skip=1 | |||
11 | malt_shovel_brewery | Australia | null |
12 | matilda_bay_brewing | Australia | null |
… | … | … | … |
… | … | … | … |
… | … | … | … |
… | yellowstone_valley_brewing | United States | null |
… | yuengling_son_brewing | United States | null |
… | zea_rotisserie_and_brewery | United States | null |
… | fosters_tien_gang | Viet Nam | null |
… | hue_brewery | Viet Nam | null |
So as you can see in the examples above, the query uses the startkey, a document id, and just passes it using skip=1.
Let’s now look at the application code, once again in Python
Views with Reduce function
Couchbase Java SDK Paginator
In the previous examples, I have showed how to do pagination using the various query parameters. The Java SDK provides a Paginator object to help developers to deal with pagination. The following example is using the same view with the Paginator API.
So as you can see you can easily paginate on the results of a Query using the Java Paginator.
- At the line #37, the Paginator is created from using the view and query objects and a page size is specified
- Then you just need to use the hasNext() and next() methods to navigate in the results.
The Java Paginator  is aware of the fact that they query is using a reduce or not, so you can use it with all type of queries – Internally it will switch between the skip/limit approach and the doc_id approaches. You can see how it is done in the Paginator class.
Note that if you want to do that in a Web application between HTTP request you must keep the Paginator object in the user session since the current API keeps the current page in its state.
Conclusion
In this blog post you have  learned how to deal with pagination in Couchbase views; to summarize
- The pagination is based on some specific parameters that you send when executing a query.
- Java developers can use the Paginator class that simplifies pagination.
FROM tutorial
WHERE age > 30
LIMIT 2
OFFSET 2
If you want to learn more about N1QL:
Lots of useful info, great!
One thing I noticed about pages (and your article mentions) is that the time to retrieve page X is proportional to the distance X is from 0. For example, if I have 100,000 rows, and my pages are 1,000 each (giving me 100 pages), and I retrieve page 100, it will take the same amount of time as retrieving the entire 100,000. You have a relatively elaborate workaround noted for the simplicity of the task – perhaps this is functionality that can be baked-in to the view page logic by default?
What if viewQuery keys(=startKey) are UNIQUE each other, then I think no need to use startkey_docid. Am I right?
And more… startkey_docid is independent to view query key, so it seems that it may take some cost to find out starting point by start_key_docid, from ordered view quiery index – must scan from starting point till meet docid.
Hello, how can i get totalPage with pagination.