FTS partial phrase search

iisuru · February 9, 2019, 4:44am

Hi All,
I want to search a document field with large paragraph data.
ie :lets say field value is “Today is a rainy day.Umbrella is required.I came ye,sterday.I will come for five weekdays.”

Does fts create inverted index with tokenizing with space and let me search by exact word match ??
ie :search by “rainy” will return 1 result and search by “required” return 1 result.search by “Tod” will return 0. since there it is a partial match

I want to search for partial phrase “day” .What additional index level configs are needed to perform partial matched in a text field regardless of being a prefix or suffix?

Please send your replies.
Thanks
Isuru

sreeks · February 9, 2019, 5:42am

FTS creates an index by tokenising (text analysis) the document field contents. But this depends on the type of analysers configured for the field in the index definition/mapping.
useful information/clues can be found here,
https://docs.couchbase.com/server/6.0/fts/fts-creating-indexes.html
https://docs.couchbase.com/server/6.0/fts/fts-using-analyzers.html

For partial searches, you may need to use different query types like prefix, wildcard, or regex.
https://docs.couchbase.com/server/6.0/fts/fts-queries.html

Depending in the analyser’s used, the token “rainy” could become “rain” in the index.
Another important thing to be aware is that, during the query processing phase, the query text also goes through the same text analysis process. It would use the same analyser configured for the respective fields in the original index definition on which you are querying against.

regards,
Sreekanth

iisuru · February 9, 2019, 6:03am

Hi Sreekanth,
Thanks for the reply.
Just little more clarification needed.

Lets say I want to do a traditional LIKE search %searchText% .This will list down all the matches regardless of searchtext is at prefix or postfix.

You mean in FTS you need to combine prefix query ,suffixquery and another regex query for other scenarios ?Meaning you need altogether 3 queries at least to do a partial search?

sreeks · February 11, 2019, 4:25am

I guess you might have figured this already by going over the query types.
You just need to use either of those query types based on the exact query requirements.

In case you need to perform exact partial/substring searches as you mentioned before, you may need to use n-gram analysers during indexing. We don’t support certain cases of regular expressions (in regex queries) for performance/scalability considerations. For eg: “word boundaries are not allowed”.

https://docs.couchbase.com/server/6.0/fts/fts-using-analyzers.html

keshav_m · February 11, 2019, 7:24am

Another option is to use FTS to narrow intermediate results to a narrow the candidate result and then use N1QL to apply the LIKE predicate on top of it.

See: https://blog.couchbase.com/curl-comes-n1ql-querying-external-json-data/

Doing things like this will become easier in the upcoming release.

iisuru · February 11, 2019, 4:16pm

Sreeks,
Is there anyway to pass minimum and maximum to ngram analyser depending on token length?
i e.ngram 1 and length(token)

This will ensure all the required token combinations are indexed

Thanks

sreeks · February 11, 2019, 5:07pm

Don’t think there is a way to do this with dynamically varying token sizes.
But certainly this approach won’t scale. It will have huge space amplification factor for the inverted index and the approach won’t work out with a reasonably heavy loaded system.

iisuru · February 11, 2019, 5:34pm

Thanks sreeks
Hmm then way to go should be regex based search for partial phrases.
I presume regex should be enough to do a prefix,suffix or middle match on a certain field

iisuru · February 12, 2019, 5:54pm

Hi Sreeks,
Need clarification on the below.

When index is defined in the fts console we define index for a “country” field in a given type and in the java sdk we write like this i e SearchQuery.match(“usa”) so it will simply match against the country index field.
If we write SearchQuery.match(“FL”).field(“state”) does it avoid indexes and do a key value search or do we need to create additional index using “state” field?
If I search using multiple fields lets say 3 fields in the same document type is it a good practice to create 3 indexes and use 3 conjunction queries to append them or some other method?
Is there any way to search by other fields apart from the indexed fields ?Might be correlated to 1st concern
SearchQuery.regexp(".texttosearch.") (dotasterixtexttosearchdotasterix)is fetching all the partial phrases matching “texttosearch” Seems prefixes and suffixes are included too…Just need to ensure lower case text is passed.So this will resolve partial text search issue.

Please clarify me the top3 issues and appreciate your feedback on regex based partial text search.

Thanks
Isuru

iisuru · February 13, 2019, 8:50am

Sreek
After further exploration I found that we can create multiple indexes under same ftsindex and I can use field name to do search on desired index field.

You can ignore my previous question.However you can still give your invaluable input
Thanks
Isuru

sreeks · February 13, 2019, 11:13am

Hey,
Let me try to give a brief answer to each of your concerns,

FTS is capable of serving search request /performing search only on the indexed field. It won’t do anything extra nor will throw an error saying field not indexed. It is upto the administrator to ensure that right fields are indexed. For SDK specific queries, better to post that in SDK forums to get quick and precise answers.
For just three fields, its always normal/better to create a single FTS index comprising all three of them and perform suitable queries on it. Having said this, there could be exceptional cases arising from scaling/ performance / data impedance reasons to do it otherwise.

[There is a feature called - FTS index alias - An alias can cover multiple FTS indexes behind that and once you submit a query against an index alias - the search gets performed on all the indexes behind it and the results are returned. refer documentation]

Searching un-indexed fields is not a capability FTS provides.
Passing lower case is expected as that is resulting from the analysers used. Please refresh yourself on those documentation links. .

Cheers!,
Sreekanth

iisuru · February 13, 2019, 11:26am

Sreeks
Thanks for the reply.I have now demystified what is needed for my app.For lower case I can use to_lower filter so that indexes are stored in lower case .

Thanks for the continuous support

Regards
Isuru

iisuru · February 15, 2019, 10:05am

Sreeks,Based on your inputs and couchbase urls I did a public speech on couchbase FTS

If you get time go through it and let me know if I misspeak. I am a newbie to couchbase and fts.

Thanks
Isuru

Topic		Replies	Views
Match/wildcard search using full text search feature for a field with String of Arrays is not working Couchbase Server fts	4	106	January 28, 2025
FTS matching an exact word of a format "aaaa:100@143" SQL++ fts	9	113	December 31, 2024
FTS Index to find improperly concatenated terms or phrases Full Text Search	11	1333	May 14, 2020
FTS Search UUID not working Full Text Search	3	1189	August 2, 2019
Full Text Search Not Returning Exact Results Couchbase Server query	3	2010	September 3, 2016

FTS partial phrase search

Related topics