Diacritic insensitive in like query

Hi,

I wanna know how i can make one query which gives me all the results without distinguish between characters that contain diacritical marks and their non-marked counterpart.

if I want to find all the records that contains iphóne/iphoné/iphonè or iphone, looking for only with “%iphone%”. I want the query to be diacritic insensitive. But I have not seen anything about this in the documentation.

Thanks

N1QL will not do fuzzy match. You can consider FTS

Using FTS would certainly help you in this situation, to perform a fuzzy search (edit distance). Note that for this you’ll first need to define a Full Text Search index over your couchbase bucket. You can either make the index use-case specific by defining it just for the field of interest or set up a dynamic default index which indexes everything.

Here’s our documentation to aid you in this …
https://docs.couchbase.com/server/6.5/fts/fts-creating-indexes.html

Once you have an index defined, your full text query could look like this …

curl -XPOST -H "Content-type:application/json"
http://<username>:<password>@<ip>:8094/api/index/<fts_index_name>/query -d
`
    {
        "query": {
            "match": "iphone",
            "field": <field_name>,
            "fuzziness": 1
        }
    }
`

This query above would match iphóne, iphoné, iphonè and iphone.

Optionally you can use fr (french) / es (spanish) / en (english) analyzers while defining your index if you think they’ll better assist your use case.

Hi @couchbase_fan,

A bit more text specific and efficient/faster way of doing this would be using the “asciifolding” character filters.
You need to create a custom analyser from the FTS web console like below. This one contains only the minimum parts for this demo.

And use this custom analyser for the field to be indexed like below.

This would make all those diacritic variations searchable.
Please note this asciifolding character filter is available on 6.5.0 release.

The problem with edit distance (fuzzy query) based approach would be, its won’t scale when we have more diacritic characters present(>2, which is very normal ) in a search text, and it won’t result in the fastest query time performance.

Cheers!

1 Like

thanks, I’m trying the FTS, it’s works. :wink: