FTS Scoring Logic

Han_Chris1 · September 21, 2020, 5:14am

Hi ,
I’ve just tried FTS node in my local for testing the functionality.
I’ve put some documents (contains name, dob, email) then I give index for that 3 fields.

The top one score is 0.121 and the second one is 0.070 .
From my point of view, the score for the second one should be higher, because 3 matching words instead of 2.

My question:
How’s the logic for this scoring?
Why the score for the second one is lower than the first one?

Thanks

sreeks · September 21, 2020, 8:55am

Hi @Han_Chris1,

You expectation is legit and that is how it should be working.
But the score computations happens at an index partition level and tf-idf computations happens at each individual partition level which could create such differences with smaller sets of data.
With larger data sets, these idf differences settles down or become negligent.

I guess you are using the default partition count of 6?
If you change this to 1, then these scoring should work as per your expectations.
Changing the number of partitions is straight forward in recent releases(6.5+). ie can be done from UI.
In older release you need to do this over REST curl commands. Let me know if you need any help there.

You may find the scoring details here - Troubleshooting and FAQs | Couchbase Docs
Also, there is a Show Scoring check box in the search page you shown in your screen shot.

Cheers!

Han_Chris1 · September 21, 2020, 1:30pm

Hi @sreeks,

Thanks for your response.
Yes, I’ve just tried to change the total partition to 1 and it works as expected.

So, you mean if my dataset is small, I need to use 1 partition, then if the dataset is large, need to switch to 6 partitions?
Or what’s the best practice for setting this total partition?

Thanks

sreeks · September 21, 2020, 2:00pm

If you are data set is small and your use case depends on tf-idf scores then a single partitioned index is a possibility. Please note that small/big is a subjective thing depending on the scenario, but a few millions should be reasonable for a single partitioned index.

You might need to revisit this once you start seeing performance/SLAs to meet as partitions helps in parallelising the search/indexing work load.

Total partition settings comes under the cluster scaling/sizing and you might want to reach out to support/solutions team for detailed helps there.

Cheers!

Topic		Replies	Views
How to configure the FTS scoring? Full Text Search	22	1430	February 16, 2021
Change No. of pindex Full Text Search	7	1611	June 17, 2021
Full Text Search relevance scores in multi tenants environment Full Text Search	13	2192	November 21, 2019
Getting only partial result from FTS on 2 node cluster Full Text Search	6	1500	August 21, 2018
FTS Scoring is inconsistent among identical search term results Full Text Search	7	1567	July 22, 2020

FTS Scoring Logic

Related topics