Hi,
I’m playing with character filter. Idea is to filter some german umlauts and other characters. I can’t use standard “de” filter, because I need to use prefix/wildcard search. So idea is to filter them in couchbase via character filter (ü -> u etc.) and manually do the same with the query string (because fts don’t use analyzer for wildcard/prefix).
I indexed some documents with texts like “hello mister müller”.
Standard (no filter): wildcard query with müll* works.
character filter with “regular expression = ü, replace=u”: wildcard query with mull* does NOT work
character filter with “regular expression = ü, replace=[emtpty]”: wildcard query with mll* works
character filter with “regular expression = e, replace=a”: wildcard query with hall* works
So something seems to be wrong with ü -> u replacement, while ü -> empty or e -> a works perfectly.
Do I something wrong? Or could it be that is a problem of utf8, because ü ist a 2 byte character, while u is 1 byte?
Thanks, Pascal