FTS matching an exact word of a format "aaaa:100@143"

I am trying to get a full match on a string of the form aaaa:100@1 but somehow I can’t seem to be able to make it work. I am getting a lot of results, but none of them contain this. I believe the tokenizer is splitting this string in multiple words.

what options do I have?

I believe certain characters in the search string need to be escaped

Thanks for the answer.

I am still getting very wrong results. Do I need a specific Analyzer?

new MatchSearchQuery($airSearchString)->field("modelAIR")->boost(5);

this is how I am doing it now

If you show exactly what you are doing and what you expect to get, and what you are getting instead, someone may be able to help.

Sorry, tried to keep it short

this is the index definitions

{
  "type": "fulltext-index",
  "name": "models_meta",
  "uuid": "7108c371cfbdcc87",
  "sourceType": "gocbcore",
  "sourceName": "models_meta",
  "sourceUUID": "6b4fdfe07aeefda0b48305fbad534c34",
  "planParams": {
    "maxPartitionsPerPIndex": 1024,
    "indexPartitions": 1
  },
  "params": {
    "doc_config": {
      "docid_prefix_delim": "",
      "docid_regexp": ".*",
      "mode": "docid_regexp",
      "type_field": "type"
    },
    "mapping": {
      "analysis": {
        "analyzers": {
          "en-without-stop-words": {
            "token_filters": [
              "to_lower",
              "possessive_en"
            ],
            "tokenizer": "unicode",
            "type": "custom"
          }
        }
      },
      "default_analyzer": "standard",
      "default_datetime_parser": "dateTimeOptional",
      "default_field": "_all",
      "default_mapping": {
        "dynamic": true,
        "enabled": true
      },
      "default_type": "_default",
      "docvalues_dynamic": false,
      "index_dynamic": true,
      "store_dynamic": false,
      "type_field": "_type"
    },
    "store": {
      "indexType": "scorch",
      "segmentVersion": 15
    }
  },
  "sourceParams": {}
}

here is a sample of a document that I am indexing

{
  "addedUnixTimestamp": 1701517809,
  "imageUrlHash": "bf069314ae1b298dd8fc108274d30489",
  "modelAIR": "aaa:100000@190800",
  "modelSize": 114438268,
  "thumbsDownCount": 0,
  "thumbsUpCount": 0,
  "type": 3
}

and I am trying to search for exact match of the modelAIR: aaa:100000@190800

I am using PHP and doing

(new MatchSearchQuery($airSearchString))->field("modelAIR")->boost(5);

but seems that is returning results that have nothing to do with what I am searching.

I tried adding ->analyzer("keyword") but in this case is not returning anything.

I am escaping the character ‘:’ so, I am making the request with aaa\:100000@190800 looks like @ doesn’t need to be escaped

Can you show all the code - where you define airSearchString and make the call? I’m not a python guy - I’m wondering why there is MatchSearchQuery($airSearchString) (with a dollar sign) instead of MatchSearchQuery(airSearchString). Is there a string substitute? Does that single back-slash get removed at some point? Are you using two back-slashes?

% cat p.py
airSearchString="aaa\\:100000@190800";
print(airSearchString);

 % python3 p.py
aaa\:100000@190800

Python complains if only one back-slash is used (although the printed string still has the back-slash in it).

% cat p.py
airSearchString="aaa\:100000@190800";
print(airSearchString);
% python3 p.py
./p.py:1: SyntaxWarning: invalid escape sequence '\:'
  airSearchString="aaa\:100000@190800";
aaa\:100000@190800

Hi, this is not Python is PHP

   $airSearchString = str_replace(
                            ':',
                            '\:',
                            $searchConfig['searchString']
                        );

it is the equivalent of:

new MatchSearchQuery("aaa\:123@123")->field("modelAIR")->boost(5);

Can you show the rest of the code?

The issue is with your analyzer, please use this tool and recreate your custom analyzer and test it https://bleveanalysis.couchbase.com/analysis - I did and then I supplied

aaa:100000@190800

As you can see the “:” and the “@” are removed.

image

Now let’s try a keyword analyzer

image

Note FTS uses Golang regex syntax Using the OnPrem or Capella UI try a FTS query like:

{ "regexp": "aaa.[0-9]+.[0-9]+", "field": "modelAIR" }

I use “.” to match the “:” and the “@” characters above you will get your match.

I admit I have issues matching “:” and the “@” characters so I chose the golang regex syntax for punctuation (== [!-/:-@[-`{-~]) the follwoing will still work.

{ "regexp": "aaa[[:punct:]][0-9]+[[:punct:]][0-9]+", "field": "modelAIR" }

Next I used the HEX method in golang regex syntax for “:”

{ "regexp": "aaa\\x3A[0-9]+[[:punct:]][0-9]+", "field": "modelAIR" }

Because it is in a string I had to escape the backslash. Then for “@”

{ "regexp": "aaa\\x3A[0-9]+\\x40[0-9]+", "field": "modelAIR" }

Because it is in a string I had to escape the backslash. So the final Regex is as follows:

Here is your working index, note I overrode the key field you are searching on to a keyword analyzer:

image

If you just want a prefix (similar to the start of this thread) aaaa:100 try

{ "regexp": "aaa\\x3A100.+", "field": "modelAIR" }

or if you are looking for a prefix of aaaa:100@

{ "regexp": "aaa\\x3A100\\x40.+", "field": "modelAIR" }

Your final definition is below

{
  "type": "fulltext-index",
  "name": "forum._default.models_meta",
  "sourceType": "gocbcore",
  "sourceName": "models_meta",
  "planParams": {
    "maxPartitionsPerPIndex": 1024,
    "indexPartitions": 1
  },
  "params": {
    "doc_config": {
      "docid_prefix_delim": "",
      "docid_regexp": ".*",
      "mode": "docid_regexp",
      "type_field": "type"
    },
    "mapping": {
      "analysis": {
        "analyzers": {
          "en-without-stop-words": {
            "token_filters": [
              "to_lower",
              "possessive_en"
            ],
            "tokenizer": "unicode",
            "type": "custom"
          }
        }
      },
      "default_analyzer": "standard",
      "default_datetime_parser": "dateTimeOptional",
      "default_field": "_all",
      "default_mapping": {
        "dynamic": true,
        "enabled": true,
        "properties": {
          "modelAIR": {
            "dynamic": false,
            "enabled": true,
            "fields": [
              {
                "analyzer": "keyword",
                "docvalues": true,
                "index": true,
                "name": "modelAIR",
                "store": true,
                "type": "text"
              }
            ]
          }
        }
      },
      "default_type": "_default",
      "docvalues_dynamic": false,
      "index_dynamic": true,
      "store_dynamic": false,
      "type_field": "_type"
    },
    "store": {
      "indexType": "scorch",
      "segmentVersion": 15
    }
  },
  "sourceParams": {}
}

The above only updates the analyzer for the field “modelAIR”, FTS is very powerful but the devil is in the details.

Best

Jon Strabala
Principal Product Manager - Server‌

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.