Couchbase Mobile 2.0, introduces powerful Full Text Search (FTS) capabilities on your JSON Documents. This is part of the new Query interface based on N1QL, Couchbase’s declarative query language that extends SQL for JSON. If you are familiar with SQL, you will feel right at home with the semantics of the new API.
Full Text Search enables natural lanugage querying. This is the third in a series of posts that discusses the query interface in Couchbase Lite. This blog assumes you are familiar with the fundamentals, so if you haven’t done so already, be sure to review the earlier post first. If you are interested, links to blogs discussing other features of the Query interface are provided at the end of this post.
You can download the latest pre-release version of Couchbase Mobile 2.0 from here.
Background
If you were using 1.x versions of Couchbase Mobile, you are probably familiar with Map-Views for creating indexes and queries. In 2.0, you no longer have to create views and map functions! Instead, a simple interface allows you to create indexes and you can use a Query Builder interface to construct your queries. The new query interface is simpler to use and much more powerful in comparison. We will discover some of it’s features in this post.
Sample Project
While the examples discussed here use Swift for iOS, note that barring some minor differences, the same query interface is supported on the Android and Windows platforms as well.
So with some minor tweaks, you should be able to reuse the query examples in this post when working with other platforms.
Follow instructions below if you are interested in a sample Swift Project
- Clone the iOS Swift Playground from GitHub
1$ git clone https://github.com/couchbaselabs/couchbase-lite-ios-api-playground - Follow the installation instructions in the corresponding README file to build and execute the playground.
Sample Data Model
We shall use the Travel Sample database located here. You can embed this pre-built database into your mobile application and start using it for your queries.
The sample data set includes several types of documents as identified by the type
property in the document. We will focus on documents of type
“landmark” . The JSON document model is shown below. For brevity, we have omitted some of the properties that are not relevant to this post from the model below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
{ "activity": "see", "address": "84 rue Claude Monet", "alt": "Fondation Claude Monet", "city": "Giverny", "content": "the house is quietly eccentric and highly interesting in an Orient-influenced style, and includes Monet's collection of [http://www.intermonet.com/japan/ Japanese prints]. There are no original Monet paintings on the site - the real drawcard, is the gardens around the house ...", "country": "France", "directions": null, "email": null, "geo": { "accuracy": "ROOFTOP", "lat": 49.0753489, "lon": 1.5337884 }, "hours": "open April-October Mo-Su 9:30-18:00", "id": 10061, "image": null, "name": "Monet's House", "phone": "+33 232512821", "price": "€9, $5 students, €4 4.00 disabled, under-7s free", "state": "Haute-Normandie", "title": "Giverny", "tollfree": null, "type": "landmark", "url": "http://www.fondation-monet.com/" } |
** Refer to the model above for each of the query examples below. **
The Database Handle
In the queries below, we will use the Database
API to open/create CouchbaseLite Database.
1 2 |
var options = DatabaseConfiguration() let db = try Database(name: kDBName, config: options) |
The Basics
Full Text Search enables natural lanugage querying. In our post on the Query Fundamentals, we discussed the like and regex expressions for pattern matching operations. FTS supercedes that capability by enabling support for stemming, relevance based ranking and locale-specific natural language querying.
Full Text Searches are case insensitive and use the match
query expression. In order to perform FTS, you must create Full Text Index on appropriate properties. You can create index on one or more properties.
Stemming
Before we proceed with the examples, first a word on Stemming. Stemming is the process of reducing words to their root stem word. So for instance, “catty”, “catlike” and “cats” are reduced to the word “cat”. So searching for the term “cats” would give us results that match “cat”, “catlike” and so on.
Couchbase Lite currently supports Stemming in the following languages
* danish
* dutch
* english
* finnish
* french
* german
* hungarian
* italian
* norwegian
* portuguese
* romanian
* russian
* spanish
* swedish
* turkish
If no specific language is used, the tokenizer will still break the text into words at Unicode whitespace characters. So it should work, although less well, with any language that puts spaces between words.
Full Text Index
The name
that is associated with the index during creation is important. The query examples that we will see later will refer to the appropriate index via the name
Single Property Index
The following example creates a fullTextIndex
on the “content” property of a Document
. Stemming is enabled by default and the locale is assumed to be the locale of the device. While not shown below, you also have the option of specifying if “accents” have to be ignored or not via the ignoreAccents
option. By default, accents are not ignored.
1 2 |
let ftsIndex = IndexBuilder.fullTextIndex(items: FullTextIndexItem.property("content")) try db.createIndex(ftsIndex,withName: "ContentFTSIndex") |
Multiples Property Index
The following example creates a fullTextIndex
on “content” and “name” properties of a Document
1 2 |
let ftsIndex = IndexBuilder.fullTextIndex(items: FullTextIndexItem.property("content"),FullTextIndexItem.property("name")) try db.createIndex(ftsIndex,withName: "ContentAndNameFTSIndex") |
Index without stemming
The following example creates a fullTextIndex
on the “content” property of a Document
with stemming disabled. Stemming is enabled by default using the current device language settings. Setting language to nil will disable stemming.
1 2 |
let ftsIndex = IndexBuilder.fullTextIndex(items: FullTextIndexItem.property("content")).language(nil) try db.createIndex(ftsIndex,withName: "ContentFTSIndexNoStemming") |
FTS Search with Stemming
The query below fetches the id and content properties of “landmark” type
documents containing the term “Mechanical” in the “content” property. We use the “ContentFTSIndex” that was created earlier.
Request
1 2 3 4 5 6 7 8 9 |
let ftsExpression = FullTextExpression.index("ContentFTSIndex") let searchQuery = QueryBuilder .select(SelectResult.expression(Meta.id), SelectResult.expression(Expression.property("content"))) .from(DataSource.database(db)) .where( Expression.property("type").equalTo(Expression.string ("landmark")) .and( ftsExpression.match("Mechanical"))) .limit(Expression.int(limit)) |
Sample Response
The response to the above query will include documents that contain the terms “mechanical”, “mechanism”, “mechanisms”, “mechanic” and so on.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
[ { "id": "landmark_21703", "content": "The Swiss luxury watch manufacturer, founded in 1851, is known for precise mechanics." }, { "id": "landmark_2592", "content": "Here you can see the mechanisms that drive San Francisco's famed cable cars, as well as plenty of cable car memorabilia and information on the history of the cable cars." }, { "id": "landmark_26144", "content": "This scenic section of the Golden Gate National Recreation Area is a favorite for hikers, bikers and beach-goers, with rugged coastal highlands and deep sand dunes. Hang gliding is quite popular here, with several shops for hang gliders in the area. Nearby is the remnants of Battery Davis, a WWII-era military defense mechanism." }, { "id": "landmark_33234", "content": "Western-style steakhouse features a huge, ‘country’ bar and even a mechanical bull for those ready for a faux bull-riding adventure. The menu is vast and surprisingly inexpensive. And again, the saloon is a longtime trendy destination along the Sunset Strip for raucous good times." } ] |
FTS Search without Stemming
The query below fetches the id and content properties of “landmark” type
documents containing the exact term “Mechanical” in the “content” property. We use the “ContentFTSIndexNoStemming” that was created earlier which specified the option to disable stemming.
Request
1 2 3 4 5 6 7 8 |
let ftsExpression = FullTextExpression.index("ContentFTSIndexNoStemming") let searchQuery = QueryBuilder .select(SelectResult.expression(Meta.id), SelectResult.expression(Expression.property("content"))) .from(DataSource.database(db)) .where(Expression.property("type").equalTo(Expression.string ("landmark")) .and( ftsExpression.match("Mechanical"))) .limit(Expression.int(limit)) |
Sample Response
The response to the above query will include documents that contain exactly the term “mechanical” in it. Note again that all searches are case insensitive.
1 2 3 4 5 6 7 |
[ { "id": "landmark_33234", "content": "Western-style steakhouse features a huge, ‘country’ bar and even a mechanical bull for those ready for a faux bull-riding adventure. The menu is vast and surprisingly inexpensive. And again, the saloon is a longtime trendy destination along the Sunset Strip for raucous good times." } ] |
FTS Search on Multiple Properties
The query below fetches the id , name and content properties of “landmark” type
documents containing the term “Mechanical” in either the “name” or the “content” property. We use the “ContentAndNameFTSIndex” that was created earlier. This index enabled indexing on the “name” and “content” properties
Request
1 2 3 4 5 6 7 8 9 |
let ftsExpression = FullTextExpression.index("ContentAndNameFTSIndex") let searchQuery = QueryBuilder .select(SelectResult.expression(Meta.id), SelectResult.expression(Expression.property("name")), SelectResult.expression(Expression.property("content"))) .from(DataSource.database(db)) .where(Expression.property("type").equalTo(Expression.string ("landmark")) .and( ftsExpression.match("Mechanical"))) .limit(Expression.int(limit)) |
Sample Response
The response to the above query will include documents that contain the term “mechanical” (or variants of it derived through stemming) in either the “name” or “content” property.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
[ { "id": "landmark_10062", "name": "Natural Mechanical Museum", "content": "Founded by the Guillemard brothers: Jean-pierre, René and Gérard currently run restorations and exhibitions with the help of an enthusiasts team who devoted their time and known-how to the Patrimony preservation. The museum origin is a private collection of steam internal combustion engines; founded in 1955 by the Guillemard family a GIVERNY residents since generations." }, { "id": "landmark_21703", "name": "Patek Philippe Salons", "content": "The Swiss luxury watch manufacturer, founded in 1851, is known for precise mechanics." }, { "id": "landmark_25929", "name": "Cable Car Museum", "content": "Here you can see the mechanisms that drive San Francisco's famed cable cars, as well as plenty of cable car memorabilia and information on the history of the cable cars" }, { "id": "landmark_26144", "name": "Fort Funston", "content": "This scenic section of the Golden Gate National Recreation Area is a favorite for hikers, bikers and beach-goers, with rugged coastal highlands and deep sand dunes. Hang gliding is quite popular here, with several shops for hang gliders in the area. Nearby is the remnants of Battery Davis, a WWII-era military defense mechanism" } ] |
FTS Search with Logical Expressions
In an earlier example, you saw that by disabling stemming, you can look for the exact search string. But what if you wanted to look for more than one search term ? The match
query expression accepts logical expressions including AND and OR.
The query below fetches the id , and content properties of “landmark” type
documents containing the term “Mechanical” or “Mechanism” in the “content” property. We use the “ContentFTSIndexNoStemming” that was created earlier to disable stemming.
Request
1 2 3 4 5 6 7 8 |
let ftsExpression = FullTextExpression.index("ContentFTSIndexNoStemming") let searchQuery = QueryBuilder .select(SelectResult.expression(Meta.id), SelectResult.expression(Expression.property("content"))) .from(DataSource.database(db)) .where(Expression.property("type").equalTo(Expression.string ("landmark")) .and( ftsExpression.match("Mechanical OR Mechanism"))) .limit(Expression.int(limit)) |
Sample Response
The response to the above query will include documents that contain the eactly the terms “mechanical” or “mechanism” in the “content” property.
1 2 3 4 5 6 7 8 9 10 11 |
[ { "id": "landmark_26144", "content": "This scenic section of the Golden Gate National Recreation Area is a favorite for hikers, bikers and beach-goers, with rugged coastal highlands and deep sand dunes. Hang gliding is quite popular here, with several shops for hang gliders in the area. Nearby is the remnants of Battery Davis, a WWII-era military defense mechanism" }, { "id": "landmark_33234", "content": "Western-style steakhouse features a huge, ‘country’ bar and even a mechanical bull for those ready for a faux bull-riding adventure. The menu is vast and surprisingly inexpensive. And again, the saloon is a longtime trendy destination along the Sunset Strip for raucous good times." } ] |
FTS Search with Wilcard Expression
You can use the “*” character in the search string to represent zero or more character matches.
The query below fetches the id , and content properties of “landmark” type
documents containing the term “walt*” in the “content” property. This will match all search terms that start with “walt” followed by zero or more characters. We use the “ContentFTSIndex” that was created earlier.
NOTE: One could argue that the use of wildcard in the search term could be a naive way of implementing stemming. But then you may end up with derived forms that may not correspond to the terms derived through stemming. So it is preferrable to use stemming if that’s what you need.
Request
1 2 3 4 5 6 7 8 |
let ftsExpression = FullTextExpression.index("ContentFTSIndex") let searchQuery = QueryBuilder .select(SelectResult.expression(Meta.id), SelectResult.expression(Expression.property("content"))) .from(DataSource.database(db)) .where(Expression.property("type").equalTo(Expression.string ("landmark")) .and( ftsExpression.match("walt*"))) .limit(Expression.int(limit)) |
Sample Response
The response to the above query will include documents that contain the terms “walt”, “Walter”, “Waltham”,“Walthamstow” and so on.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
[ { "id": "landmark_10134", "content": "On the Ibrox tour, you get access to the home dressing room and hear a recorded message from Walter Smith and Ally McCoist before climbing the marble staircase, visit the illustrious trophy room, the blue room and the managers office. Tickets, except for matches against Celtic, are available online from the clubs website, ticket centre at the stadium and club outlets at JJB Sports Stores in Glasgow city centre." }, { "id": "landmark_16104", "content": "Presents the history of Waltham Forest. The building was constructed to be a work house and has since been used as a police station and a private home. Its collection includes the Bremer car, built by engineer Frederick Bremer in 1892 it has a claim to being this first petrol-driven car made in Britain." }, { "id": "landmark_16105", "content": "The ancient nucleus of present day Walthamstow centred around the 12 th century St.Marys Church " }, { "id": "landmark_16574", "content": "Impressive hall architecture complete with tours most days.The Dorothy Chandler Pavilion is open to the public Christmas Eve day with almost round the clock performances by amateur cultural arts groups.The Walt Disney Hall has daily tours ,check website for schedules." }, { "id": "landmark_8631", "content ": "Museum about famous Scottish authors, focussing on Robert Burns, Sir Walter Scott and Robert Louis Stevenson " } ] |
FTS Search with Stop Words
Stop Words refer to common words in a language. In English, this would be terms like “the”, “is”, “and” , “which” and so on.
Example 1: Search String contains stop words
Couchbase Lite ignores stop words that appear in search string.
The query below fetches the id , and content properties of “landmark” type
documents containing the term “on the history” in the “content” property. We use the “ContentFTSIndex” that was created earlier.
Couchbase Lite ignores the stop words “on” and “the”, so you would fetch documents that only include the term “history” and derived forms of the stem word
Request
1 2 3 4 5 6 7 |
let ftsExpression = FullTextExpression.index("ContentFTSIndex") let searchQuery = QueryBuilder.select(SelectResult.expression(Meta.id), SelectResult.expression(Expression.property("content"))) .from(DataSource.database(db)) .where(Expression.property("type").equalTo(Expression.string ("landmark")) .and( ftsExpression.match("on the history"))) .limit(Expression.int(limit)) |
Sample Response
The response to the above query will include documents that contain the terms “history” and derived forms of this word such as “historical”
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
[ { "id": "landmark_10019", "content": "Museum on military engineering and the history of the British Empire. A quite extensive collection that takes about half a day to see. Of most interest to fans of British and military history or civil engineering." }, { "id": "landmark_10083", "content": " Tours take about 45 min. In front the building, George Square, the citys notional centre, is populated by several statues of civic leaders and famous figures from history and is often used for outdoor events." }, { "id": "landmark_10093", "content": "The auditorium has now garnered some world fame for being the place where the Susan Boyle audition - one of the most downloaded YouTube video clips in history - was filmed." }, { "id": "landmark_10101", "content": "This museum has a large collection of artifacts and exhibits showcasing the history of the city. If you don't want to pay to enter the museum itself, you can just walk into the building (which contains three separate museums) and look at some historical photographs on the walls of the atrium." }, { "id": "landmark_10105", "content": "The Peoples Palace is a great folk museum, telling the history of Glasgow and its people, from various perspectives, displaying details of Glasgow life (including one of Billy Connolly's banana boots). The Winter Gardens, adjacent, is a pleasant greenhouse with a reasonable cafe.)" } ] |
Example 2: Ignoring Stop Words while Searching
By default, Couchbase Lite ignores stop words within the search content.
The query below fetches the id , and content properties of “landmark” type
documents containing the terms “blue fin yellow fin” in the “content” property. We use the “ContentFTSIndex” that was created earlier.
Couchbase Lite ignores stop words during search, so you would fetch documents that include the terms “blue”, “fin” and “yellow” in that order, separated by any number of stop words.
Request
1 2 3 4 5 6 7 |
let ftsExpression = FullTextExpression.index("ContentFTSIndex") let searchQuery = QueryBuilder.select(SelectResult.expression(Meta.id), SelectResult.expression(Expression.property("content"))) .from(DataSource.database(db)) .where(Expression.property("type").equalTo(Expression.string ("landmark")) .and( ftsExpression.match("blue fin yellow fin"))) .limit(Expression.int(limit))<code class="swift"> |
Sample Response
The response to the above query will include documents that contain the terms “blue”, “fin” and “yellow” separated by any number of stop words such as “blue fin and yellow fin”
1 2 3 4 5 6 7 |
[ { "id": "landmark_18840", "content": "This large aquarium specializes in exhibiting local sea life in typical local habitat displays, and has many spectacular exhibits. It is particularly known for its Kelp Forest exhibit, three stories high, filled with several varieties of giant kelp and a wide variety of marine animal species, and also for its million-gallon Open Sea exhibit with large blue fin and yellow fin tunas, mahi-mahis, sharks (including an occasional Great White Shark as a very temporary visitor, before being released back to the ocean), ocean sunfish (mola-molas) and sea turtles. The best exhibits include a large tank of silver sardines that swim around and around above one's head, and one of rescued sea otters deemed unreturnable to the wild and therefore kept at the aquarium. )" } ] |
FTS Search with Ranking
You can use the FullTextFunction.rank
to specify the rank order of the search results. This is useful to rate the matches in order of best match.
The query below fetches the id , and content properties of “landmark” type
documents containing the term “attract” in the “content” property. The documents are ordered in descending order according to rank which means that the document which the maximum number of matches is sorted higher than the rest.
Request
1 2 3 4 5 6 7 8 |
let ftsExpression = FullTextExpression.index("ContentFTSIndexNoStemming") let searchQuery = QueryBuilder.select(SelectResult.expression(Meta.id), SelectResult.expression(Expression.property("content"))) .from(DataSource.database(db)) .where(Expression.property("type").equalTo(Expression.string ("landmark")) .and( ftsExpression.match("attract"))) .orderBy(Ordering.expression(FullTextFunction.rank("ContentFTSIndexNoStemming")).descending()) .limit(Expression.int(limit)) |
Sample Response
The response to the above query will include documents that include the term “attract” or derived versions of it. Documents with the maximum number of matches are sorted higher.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
[ { "id": "landmark_22056", "content": "Top paid-for visitor attraction in Wales including a farm, indoor vintage funfair, zoo and indoor and outdoor adventure play. All-weather family attraction with 50% of attractions undercover." }, { "id": "landmark_16309", "content": "The London Bridge Experience and London Tombs are two scare attractions for one price and have been voted the UK's Best Year Round Scare Attraction for three years running." }, { "id": "landmark_25216", "content": "TA seaside amusement park located near the southern end of Mission Beach, Belmont Park is a landmark with a number of shops, restaurants, an arcade, and a bunch of rides. The big attraction is the Giant Dipper, a historic roller coaster that is one of the only two remaining oceanfront roller coasters still operating on the west coast. Among the other rides is a FlowRider (a simulated wave attraction which you can bodyboard on), an antique carousel, bumper cars, slides, pendulum rides, tilt-a-whirl, and a trampoline " }, { "id": "landmark_1059", "content": "Aims to tell the history of flight throughout the 20th Century, and has a large collection of aircraft, including British Airways Concorde G-BOAA. Another rather good attraction (and well worth the look) is the De-Havilland Comet 4C, a derivative of the Worlds first jetliner." } ] |
Limitations
While the FTS capabily in Couchbase Lite 2.0 is extremely powerful and would suffice for use cases typical on an embedded database, there are a few limitations
- Match Expression can only be at the top-level or top-level AND expression. This means that the following expression is not allowed ftsExpression.match(“attract”).or(ftsExpression2.match(“museum”))
- Custom Language Tokenizers
The list of supported languages was specified earlier. At the time of writing this post, you cannot plug in a custom tokenizer in order to extend support to other languages - Fuzzy Search Support
We cannot specify a “fuzziness” factor on the query that may result in less relevant matches being considered - Facets
There is no support for faceted search
Bear in mind that Couchbase Lite is an embedded database. So one could argue that the FTS capabilities does not have to be as extensive as a server side database implementation. The support for these will be evaluated in future releases.
What Next
This blog post looked at how you can leverage the Full Text Search (FTS) capabilities in the new Query API in Couchbase Mobile 2.0. This is a start. Expect to see more functionality in future releases. You can download the latest release from our downloads page.
Here are a few other Couchbase Mobile Query related posts that may be of interest
– This blog post discusses the fundamentals
– This blog post discusses how to query array collections
– This blog post discusses how to do JOIN queries
If you have questions or feedback, please leave a comment below or feel free to reach out to me at Twitter @rajagp or email me priya.rajagopal@couchbase.com. The Couchbase Forums are another good place to reach out with questions.