I want to sort a dataset by one field. Since I do not want to apply any transformations like stemming to the value before sorting, I’ve choosen to use the keyword
analyzer which leaves the field value unchanged. This works for most cases but there are lots of words that are always sorted wrong. They seem to not have been index properly.
How to reproduce
Create some data:
INSERT INTO `Test`(KEY, VALUE) VALUES
("a", {"a": "Aaa" }),
("b", {"a": "Bbb" }),
("c", {"a": "Ccc" }),
("d", {"a": "Ddd" }),
("e", {"a": "Eee" }),
("f", {"a": "Fff" }),
("g", {"a": "Ggg" }),
("h", {"a": "Hhh" }),
("i", {"a": "Iii" }),
("j", {"a": "Jjj" }),
("k", {"a": "Kkk" }),
("l", {"a": "Lll" }),
("m", {"a": "Mmm" }),
("n", {"a": "Nnn" }),
("o", {"a": "Ooo" }),
("p", {"a": "Ppp" }),
("q", {"a": "Qqq" }),
("r", {"a": "Rrr" }),
("s", {"a": "Sss" }),
("t", {"a": "Ttt" }),
("u", {"a": "Uuu" }),
("v", {"a": "Vvv" }),
("w", {"a": "Www" }),
("x", {"a": "Xxx" }),
("y", {"a": "Yyy" }),
("z", {"a": "Zzz" });
Now create a fulltext index on the data and analyze the a
field with the “keyword” analyzer:
Json index definition
Finally issue a query towards the FTS index:
curl --request POST 'http://localhost:8094/api/index/test/query' \
--header 'Authorization: Basic <...>' \
--header 'Content-Type: application/json' \
--data-raw '{
"size": 30,
"fields": [ "*" ],
"sort": [ "a" ],
"query": {
"query": "a:*"
}
}'
The result will contain most words sorted in alphabetical order. Just some words have ����������
in their sort array and are not sorted properly. According to the documentation this means that the field is not indexed:
{
"status": {
"total": 6,
"failed": 0,
"successful": 6
},
"request": {
"query": {
"query": "a:*"
},
"size": 30,
"from": 0,
"highlight": null,
"fields": [
"*"
],
"facets": null,
"explain": false,
"sort": [
"a"
],
"includeLocations": false,
"search_after": null,
"search_before": null
},
"hits": [
{
"index": "test_439eb73d2e79c6c4_f4e0a48a",
"id": "a",
"score": 0.17139825364493735,
"sort": [
"Aaa"
]
},
{
"index": "test_439eb73d2e79c6c4_54820232",
"id": "b",
"score": 0.21164339756999317,
"sort": [
"Bbb"
]
},
{
"index": "test_439eb73d2e79c6c4_18572d87",
"id": "c",
"score": 0.17139825364493735,
"sort": [
"Ccc"
]
},
{
"index": "test_439eb73d2e79c6c4_f4e0a48a",
"id": "d",
"score": 0.17139825364493735,
"sort": [
"Ddd"
]
},
{
"index": "test_439eb73d2e79c6c4_6ddbfb54",
"id": "e",
"score": 0.21164339756999317,
"sort": [
"Eee"
]
},
{
"index": "test_439eb73d2e79c6c4_18572d87",
"id": "f",
"score": 0.17139825364493735,
"sort": [
"Fff"
]
},
{
"index": "test_439eb73d2e79c6c4_54820232",
"id": "g",
"score": 0.21164339756999317,
"sort": [
"Ggg"
]
},
{
"index": "test_439eb73d2e79c6c4_54820232",
"id": "h",
"score": 0.21164339756999317,
"sort": [
"Hhh"
]
},
{
"index": "test_439eb73d2e79c6c4_aa574717",
"id": "i",
"score": 0.21164339756999317,
"sort": [
"Iii"
]
},
{
"index": "test_439eb73d2e79c6c4_6ddbfb54",
"id": "j",
"score": 0.21164339756999317,
"sort": [
"Jjj"
]
},
{
"index": "test_439eb73d2e79c6c4_13aa53f3",
"id": "k",
"score": 0.21164339756999317,
"sort": [
"Kkk"
]
},
{
"index": "test_439eb73d2e79c6c4_aa574717",
"id": "l",
"score": 0.21164339756999317,
"sort": [
"Lll"
]
},
{
"index": "test_439eb73d2e79c6c4_f4e0a48a",
"id": "m",
"score": 0.17139825364493735,
"sort": [
"Mmm"
]
},
{
"index": "test_439eb73d2e79c6c4_13aa53f3",
"id": "n",
"score": 0.21164339756999317,
"sort": [
"Nnn"
]
},
{
"index": "test_439eb73d2e79c6c4_18572d87",
"id": "o",
"score": 0.17139825364493735,
"sort": [
"Ooo"
]
},
{
"index": "test_439eb73d2e79c6c4_aa574717",
"id": "p",
"score": 0.21164339756999317,
"sort": [
"Ppp"
]
},
{
"index": "test_439eb73d2e79c6c4_f4e0a48a",
"id": "q",
"score": 0.17139825364493735,
"sort": [
"Qqq"
]
},
{
"index": "test_439eb73d2e79c6c4_6ddbfb54",
"id": "y",
"score": 0.21164339756999317,
"sort": [
"Yyy"
]
},
{
"index": "test_439eb73d2e79c6c4_18572d87",
"id": "z",
"score": 0.17139825364493735,
"sort": [
"Zzz"
]
},
{
"index": "test_439eb73d2e79c6c4_f4e0a48a",
"id": "x",
"score": 0.17139825364493735,
"sort": [
"����������"
]
},
{
"index": "test_439eb73d2e79c6c4_6ddbfb54",
"id": "v",
"score": 0.21164339756999317,
"sort": [
"����������"
]
},
{
"index": "test_439eb73d2e79c6c4_13aa53f3",
"id": "r",
"score": 0.21164339756999317,
"sort": [
"����������"
]
},
{
"index": "test_439eb73d2e79c6c4_13aa53f3",
"id": "w",
"score": 0.21164339756999317,
"sort": [
"����������"
]
},
{
"index": "test_439eb73d2e79c6c4_54820232",
"id": "t",
"score": 0.21164339756999317,
"sort": [
"����������"
]
},
{
"index": "test_439eb73d2e79c6c4_aa574717",
"id": "u",
"score": 0.21164339756999317,
"sort": [
"����������"
]
},
{
"index": "test_439eb73d2e79c6c4_18572d87",
"id": "s",
"score": 0.17139825364493735,
"sort": [
"����������"
]
}
],
"total_hits": 26,
"max_score": 0.21164339756999317,
"took": 9966662,
"facets": null
}
Is this on purpose and I’ve misconfigured / misunderstood something in FTS or is this a bug?
I’ve tried this on Couchbase 6.5.1 enterprise and community.