Using CB CE 4.1 and in the process of evaluating CB EE 4.5
Created secondary indexes on the attributed which are in where clause. However the select statement
select * from mybucket where firstname like ‘%bob%’ is very slow… I read that in some of the forum posts that you have fix in 4.5.1 or CB FT Developer Preview. Can I get a download link to try .
I don’t have information about possible performance improvements in newer versions, but generally speaking (not couch queries specifically), using a wildcard at the beginning is going to be relatively slow. Generally significantly slower than wildcards at the end.
I am trying this method and will see how it performs. Tokenizing and then doing wildcards at the end meet all your requirements with better performance. Only issue i have with this is tokenizing on the fly vs storing tokenized values directly in the index. I am slightly concerned about how this will perform but would be much more concerned about starting wildcard perfromance.
Anyways, here it is (tokenizing on whitespace, lowercasing for case insensitivity):
WHERE ANY t IN SPLIT(LOWER(`firstname`)) SATISFIES t LIKE 'bob%'
This will for example match “Jones Bobby”.
Hope this is helpful even though it doesn’t directly answer the question.
Thank you that blog entry is very helpful. I also didn’t realize that you could so easily index array entries produced by some function. I will still prefer using a method of tokenization vs suffixes since I only need to match prefixes for each token or word. This also requires far fewer index entries.
It is my view that queries equivalent to LIKE '%bob%' are usually unnecessary and would encourage avoiding it. In the case of “washington”, I’m not sure how often someone would search for it with “hington” or “ashing” or something like that, vs searching for “wash” or “washing” or some other prefix.
I think that prefix queries on a tokenized field are far more suitable for most cases. Next level would be to allow some level of fuzziness on top of it.