I am implementing a typeahead feature in our system where our customers can search for products in their own products catalog.
I created a text search index on the name of the product with a custom analyzer (single tokenizer and toLower filter)
I am doing a regex query with a limit of 500 in order to return all the products that contain the search value somewhere in the product name.
For example. if a user searches for “drink” the regex query I am doing is - .*drink.*
Basically, it works really good, but when the user searches for something with a lot of results I get an error: TooManyClauses[maxClauseCount is set to 1024]
We have 8172 documents for “Bosch” products. I am aware of the reason why I get the “TooManyClauses” err (at least I think so), but nevertheless, I want to get only 500 documents and not all the results.
Is there a way to get some results without getting the “TooManyClauses” error?
Hello @Eli_gotesman, the reason for the TooManyClauses error is because more than 1024 terms in your index matched “.*drink.*”. Limit/Size does not have anything to do with this setting. Size is applied to the result set generated once all the matches are determined for the search term.
Now, I’m not certain what release you’re on, but with the upcoming release we intend to make the maxClauseCount configurable at run time. Until such a time however, you would need to make the regex a little more narrow so the number of terms that can be matched would be lesser than 1024.
Trying to clarify the error message, It doesn’t necessarily mean that there are more than 1024 document matches/(TooManyClauses default limit).
What this means is that - there are more than 1024 searchable candidate terms which satisfies the regex or the given query criteria. So eventually there could be thousands or millions of documents satisfying the search.
Hence the intent of this error is - to hint the user that please try to be more specific about your search so that you get more relevant results for the query.
Its going to be in 6.5.0 release. The time lines are yet to be finalised.