Sorry - I have been away.
Let me cover the issue in a bit more detail: N1QL and the Indexing Service rely on the golang string implementation to handle text.
Golang strings are a slice of bytes which are interpreted as a sequence of Unicode runes, which is nice and simple, because JSON strings are in fact Unicode code points.
So far so good - you may even get away with using single byte character encoding like CP1252, and still display properly (although this is technically not supported, since U+00C5 compares as different to CP1252 0xC5, although they both display as Å).
We now enter the realm of locales, collation, equivalences and ligatures, and here is where things turn nasty.
In order to sort things properly, you need to know what language is involved: in Estonian, for instance, the ‘Z’ group is considered to be part of the ‘S’ group, and the sort collation is …S, Š, Z, Ž, T, U, V, Õ, Ä, Ö, Ü. To make things worse, F is not even part of the alphabet!
In German, the diatrictic letters Ä, Ö, Ü are not part of the alphabet, and are considered to be modification of A, O and U (in fact, before the Umlaut, they were printed as Ae, Oe and Ue), so they collate as A, Ä,… O, Ö,… U, Ü, which is different from Estonian (or Swedish) where Õ, Ä, Ö, Ü are considered to be separate from A, O, U.
From a collation point of view, certain runes are considered equivalent to others: the EsZett (ẞ) sorts as equivalent to SS and sits in between “SR” and “ST”, or, to go back to Estonian, Ü is fast becoming equivalent to Y.
Lowercase runes are equivalent to upper case for collation purposes, and ‘a’ definitely comes before ‘B’.
I digress. The long and the short is that you have to know the locale involved before you apply the rules.
Traditionally, this has been done using environmental variables (LANG, NLS_LANG, LC_CTYPE, CLIENT_LOCALE, you name it…) to inform the operating system or your database engine how to sort and display.
In fact, things are even worse than that, because, normally, you would have to indicate to the database engine what locale needs to be used when storing and inserting data, AND tell the client what locale is used on its side: the idea being that if you are using a client application in a Spanish territory to make (say) a booking on a Swedish database, the client and the server should handshake so that characters unknown on one side get translated by the other, so that, for instance, you do not insert invalid characters on the database.
I digress again.
The rub is that JSON does not have any means of specifying the locale that has been used in a document, and golang does not have any support to set a locale for processing purposes, so we do not know what locale to use for the sorting, nor, lacking that, do we have a way to force anything other than byte order at the service level.
This does not mean to say that your requirement is incorrect.
It’s just that the underlying standard, JSON, has not defined the problem, so “fixing the issue” is non trivial because lacking indication from the storage, we’ll have to find some way to enforce a locale that is consistent across services and, more importantly, across clients.