Hi there,
We are implementing a simple full text search in our application driven by couchbase database.
As there are quite a lot of french people among our users, we need ASCII Folding to make index accent insensitive.
I found this issue pending on the Bleve repository:
opened 12:19PM - 11 Jul 18 UTC
closed 07:31PM - 09 Dec 18 UTC
enhancement
Hello,
I've search in documentation and in source code but not found the res… ponse.
My goal is to allow bleve to to retrieve documents that contain (for example) :
- Belvédère
- belvédère
- belvedere
- belvedère
- Belvedere
with the same score through a string query like "belvedere".
I've use to_lower token_filter to manage case but found nothing to remove / ignore accents.
Any idea ?
Which states that:
[…] though some of the language specific analyzers include a filter which folds specific accent characters likely to appear in a particular language.
Any insight on how to get such a filter working within couchbase ? We tried quite all of the french specific filters without any success so far …
Thanks,
steve
December 5, 2018, 4:37pm
2
Hi – one quick thought that pops to mind (have not tried it myself, though!) is perhaps using one or more regexp character filters might be able to replace accented characters with their simpler ASCII versions. cheers, steve
Hi Steve,
That’s what we did to work arround, but it kind of bring some noise in the mapping.
If I read well, there’s quite a chance that an ASCII folding filter will be merged into Bleve in the future.
Does that mean that this will be available as well in Couchbase FT or is there a gap between Bleve’s latest and FTS features ?
Thanks
steve
December 8, 2018, 5:21pm
4
Hi Anton – yes, there’s definitely a lag. Features, improvements, and fixes all go into the bleve open-source library earlier. Then later on, an latest & greatest released version of Couchbase server comes out incorporating the latest bleve.
Thanks Steve, wait and see then …