Convert FTS scoring to a percentage match(absolute?)

jgcoding · November 21, 2019, 1:45pm

I am hoping to replace a handmade solution based upon the “damerauLevenshtein” algorithm with an FTS counterpart.

How does one design the index or construct a query which returns matches for which the relevancy between the search term and the field values exceed a given percentage, say 50%?

If I search for “Joseph Public” in a name field, I would like all the matches returned with similarity to that search term exceeding 50% (or any other similarity threshold provided)

Thanks you.

JG

sreeks · November 22, 2019, 4:49am

Hi @jgcoding,

I can’t think of a direct/explicit way to achieve this.
But a couple of related options coming to mind are,

Try a match query making use of a “prefix_length” parameter which is greater than the minimum amount of (percentage of ) prefix matching needed to ensure that - those many tokens are already matched.
Match query also accepts a fuzziness parameter which would then be applied to the remaining matching tokens after the specified prefix_length.
eg:
“query”: {
“match”: “Joseph Public”,
“field”:“name”,
“operator”:“and”,
“fuzziness”: 2,
“prefix_length”: 7
}
Another thinking to achieve a similar result if we have multiple tokens to always search for is by using boosting based on the amount of tokens searched.
For eg: you can have a disjunction query with multiple child match_phrase/phrase queries depending on your requirements with the highest boosting for child query with the maximum number of tokens to search for.
eg:
“disjuncts”: [
{“match_phrase”: “term1 term2 term3”, “field”: “name”, “boost”: N},
{“match_phrase”: “term1 term2”, “field”: “name”, “boost”: 2N/3}
{“match_phrase”: “term2 term3”, “field”: “name”, “boost”: N/3}
]

But all these are sort of approximations and not a precise answer to your requirement.

regards,

jgcoding · November 22, 2019, 12:24pm

I appreciate your effort. I will review your suggestions to determine if they may get us closer to our objective.

The more code I can replace with features and solutions already available in FTS, the better. A 25% reduction or more would be nice first round of refactoring and optimization.

Are you available for paid consultation?

Thank you.

JG

sreeks · November 25, 2019, 6:39am

@jgcoding, if you are a licensed customer then you may reach out to the Support Team for such assistance.
Cheers!
Sreekanth

Topic		Replies	Views
Full text search with boosting and regexp Full Text Search	5	2984	March 12, 2018
FTS partial phrase search Full Text Search	12	4607	February 15, 2019
FTS matching an exact word of a format "aaaa:100@143" SQL++ fts	9	113	December 31, 2024
FTS with PhraseSearch Full Text Search	13	941	May 8, 2023
Problems with RegexpQuery Full Text Search	0	1345	January 29, 2018

Convert FTS scoring to a percentage match(absolute?)

Related topics