FTS phrase match explanation

@abhinav can you please explain to me this FTS phrase match?

the string is: “With zombie make up on Billy”

the search phrase is: “make out”

{“query”: {“match_phrase”: “make out”, “field”:“pp”}, “score”: “none”}

I am trying to understand why there is a match on the above text

this is the index definition:

{
  "type": "fulltext-index",
  "name": "image_fts-v2",
  "uuid": "4e8a00692c191b1f",
  "sourceType": "gocbcore",
  "sourceName": "images_fts",
  "sourceUUID": "446eb8f81d4c800ccad037596d85a254",
  "planParams": {
    "maxPartitionsPerPIndex": 256,
    "indexPartitions": 4
  },
  "params": {
    "doc_config": {
      "docid_prefix_delim": "",
      "docid_regexp": "",
      "mode": "type_field",
      "type_field": "m.t"
    },
    "mapping": {
      "analysis": {},
      "default_analyzer": "en",
      "default_datetime_parser": "dateTimeOptional",
      "default_field": "_all",
      "default_mapping": {
        "default_analyzer": "en",
        "dynamic": false,
        "enabled": false
      },
      "default_type": "_default",
      "docvalues_dynamic": false,
      "index_dynamic": false,
      "store_dynamic": false,
      "type_field": "_type",
      "types": {
        "fts": {
          "dynamic": false,
          "enabled": true,
          "properties": {
            "cd": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "index": true,
                  "name": "cd",
                  "type": "number"
                }
              ]
            },
            "dl": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "index": true,
                  "name": "dl",
                  "type": "number"
                }
              ]
            },
            "ih": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "index": true,
                  "name": "ih",
                  "type": "number"
                }
              ]
            },
            "ii": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "analyzer": "keyword",
                  "index": true,
                  "name": "ii",
                  "type": "text"
                }
              ]
            },
            "im": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "analyzer": "keyword",
                  "index": true,
                  "name": "im",
                  "type": "text"
                }
              ]
            },
            "iw": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "index": true,
                  "name": "iw",
                  "type": "number"
                }
              ]
            },
            "mi": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "index": true,
                  "name": "mi",
                  "type": "number"
                }
              ]
            },
            "mt": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "index": true,
                  "name": "mt",
                  "store": true,
                  "type": "number"
                }
              ]
            },
            "np": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "analyzer": "en",
                  "include_term_vectors": true,
                  "index": true,
                  "name": "np",
                  "type": "text"
                }
              ]
            },
            "nv": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "analyzer": "en",
                  "index": true,
                  "name": "nv",
                  "type": "number"
                }
              ]
            },
            "pi": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "index": true,
                  "name": "pi",
                  "store": true,
                  "type": "boolean"
                }
              ]
            },
            "pp": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "analyzer": "en",
                  "include_term_vectors": true,
                  "index": true,
                  "name": "pp",
                  "type": "text"
                }
              ]
            },
            "pv": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "analyzer": "en",
                  "index": true,
                  "name": "pv",
                  "type": "number"
                }
              ]
            },
            "s": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "index": true,
                  "name": "s",
                  "type": "number"
                }
              ]
            },
            "st": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "index": true,
                  "name": "st",
                  "type": "boolean"
                }
              ]
            }
          }
        }
      }
    },
    "store": {
      "indexType": "scorch",
      "segmentVersion": 15
    }
  },
  "sourceParams": {}
}

out is a stop word per en language analyzer, reason why the phrase match ends up looking only for make during your search.

can I exclude it as a stop word?

Well, not using the en analyzer. What you can do is create a custom analyzer with all the en analyzer components other than the stop words (stop_en) and use that for the field pp.

So your custom analyzer’s components would be …

  • unicode tokenizer
  • possessive_en token filter
  • to_lower token filter
  • stemmer_en_snowball token filter

“out” is the only stop word?

if I add this tokenizer, it needs to reindex my entire database, right?

No there’s several others which will also end up getting indexed. Here’s the list of stop words for en (remember these are the ones that are dropped and not indexed by the analyzer):

if I add this tokenizer, it needs to reindex my entire database, right?

Any changes to the index definition that changes the mapping will cause an index rebuild - correct.

ok, I understand, creating now a clone of the index with the new analyzer. Thanks for the details. I will let you know how it is going.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.