Combining FTS Search with Parent-Child Document Retrieval in N1QL

So, we have an FTS index for searching texts on several fields in a collection called as content.

when we run a query like this:

SELECT SEARCH_META().id as id, 
SEARCH_META().score as score, 
item.`type` as `type`,
item.child_type as child_type
  FROM main._default.content AS item USE INDEX(USING FTS)
  WHERE module_identifier="eveolution.giltazell.philadelphia.robbanks" 
    AND SEARCH(item, { 
      "fields": [ "*" ], 
      "query": {
        "match_phrase": "section 1"
      }
    });

it returns a result like following (notice that some of them are of type child and some are parent):

[
  {
    "child_type": "image",
    "id": "c6d86655-7d3b-4d3d-9ac5-e551aa28ca84",
    "score": 2.794578688163785,
    "type": "child"
  },
  {
    "child_type": "text",
    "id": "bce1a9e3-0f96-4cc6-9b18-4a656a608f8d",
    "score": 2.4201761802623114,
    "type": "child"
  },
  {
    "id": "00e558e8-d793-4e85-aac8-7bfbd2a68654",
    "score": 2.4201761802623114,
    "type": "parent"
  }
]

Now, for each of the child items in the above result, I need to get the parent using the following query which uses a different global secondary index:

SELECT *
  FROM `main`.`_default`.`content` AS `parent`
    WHERE ANY `item` IN `parent`.`children` SATISFIES `item`.`id` = "<child-id>" END;

I am trying to do all of this in a single inner join query where I could get the parent_id along with id for child types.

Any expert guidance on this would be great.

Could you not join them like this:

SELECT SEARCH_META(item).id as id
 ,SEARCH_META(item).score as score
 ,item.`type` as `type`
 ,item.child_type as child_type
 ,META(parent).id as parent_id
  FROM main._default.content AS item USE INDEX(USING FTS)
       ,main._default.content AS parent 
  WHERE item.module_identifier="eveolution.giltazell.philadelphia.robbanks" 
  AND SEARCH(item,{"fields":["*"],"query":{"match_phrase":"section 1"}})
  AND ANY v IN `parent`.`children` SATISFIES v.`id` = SEARCH_META(item).id END
;

?

(You have “*” in your second query but note “parent_id” as the data you’re interested in. I’ve assumed the parent document ID (meta().id) is what you’re after, if it is in fact a field called “id” in the parent document, you could replace META(parent).id with parent.id as parent_id, etc.)

HTH.

[Edit: I’ve assumed you’re using version 7.1 or later.]

If not on 7.1+, then:

SELECT SEARCH_META(item).id as id
 ,SEARCH_META(item).score as score
 ,item.`type` as `type`
 ,item.child_type as child_type
 ,META(parent).id as parent_id
  FROM main._default.content AS item USE INDEX(USING FTS)
       JOIN main._default.content AS parent ON ANY v IN `parent`.`children` SATISFIES v.`id` = SEARCH_META(item).id END
  WHERE item.module_identifier="eveolution.giltazell.philadelphia.robbanks"
  AND SEARCH(item,{"fields":["*"],"query":{"match_phrase":"section 1"}})
  AND ANY v IN `parent`.`children` SATISFIES v.`id` = SEARCH_META(item).id END

would be the equivalent with ANSI JOIN syntax.

Many thanks for quick response.

I would say this largely works. One thing it cannot do is, when there are search results with parent entries, it doesn’t get returned.

May be you are looking for LEFT OUTER JOIN. Add LEFT before JOIN

We are on 7.2.2. In the query there is JOIN keyword.

Both syntaxes work for 7.2.2; the comma syntax is only applicable to newer versions.

SELECT SEARCH_META(item).id as id
 ,SEARCH_META(item).score as score
 ,item.`type` as `type`
 ,item.child_type as child_type
 ,META(parent).id as parent_id
  FROM main._default.content AS item USE INDEX(USING FTS)
       LEFT JOIN main._default.content AS parent ON ANY v IN `parent`.`children` SATISFIES v.`id` = SEARCH_META(item).id END
  WHERE item.module_identifier="eveolution.giltazell.philadelphia.robbanks"
  AND SEARCH(item,{"fields":["*"],"query":{"match_phrase":"section 1"}})
  /* EDIT: this is an error and shouldn't be included: AND ANY v IN `parent`.`children` SATISFIES v.`id` = SEARCH_META(item).id END */

The ANSI syntax LEFT JOIN will include results from “item” that have no match in “parent”; parent_id will be missing for those results.

HTH.

Somehow this query doesn’t respond with items of type parent where the search match are found.

When I run a simple search query like this:

SELECT 
  item.*
FROM 
  content AS item USE INDEX(USING FTS)
WHERE 
  module_identifier = "com.company.platform.compliance"
AND SEARCH(item, { 
    "fields": [ "type", "child_type", "title", "text" ],
    "query": {
      "match": "audio"
    }
  });

It returns with results of all different types.

However when I use the search query you gave me, it returns results only of type child.

Related to this, is the word “image” a reserved word? When I search for the word “image”, I get back zero results. I expect atleast 50 results.

“image” is not a reserved word in SQL++. If the problem is that SEARCH() isn’t locating anything with the word “image”, it could be down to how your FTS index was built/defined. Perhaps if you could share your index definition we might be able to spot something.

However when I use the search query you gave me, it returns results only of type child.

I presume this is the LEFT JOIN version you’re referring to?

Sorry, I spot a filter issue - please try:

SELECT SEARCH_META(item).id as id
 ,SEARCH_META(item).score as score
 ,item.`type` as `type`
 ,item.child_type as child_type
 ,META(parent).id as parent_id
  FROM main._default.content AS item USE INDEX(USING FTS)
       LEFT JOIN main._default.content AS parent ON ANY v IN `parent`.`children` SATISFIES v.`id` = SEARCH_META(item).id END
  WHERE item.module_identifier="eveolution.giltazell.philadelphia.robbanks"
  AND SEARCH(item,{"fields":["*"],"query":{"match_phrase":"section 1"}})

The SEARCH() is the search you listed originally augmented with the outer join to look-up the parent_id. Should it not work, you can comment out the META(parent)... line and the LEFT JOIN... line and it should return the item results you expect; can you verify this is the case?

Success! This worked.

Here is the FTS definition:

{
  "type": "fulltext-index",
  "name": "index-full-text-search",
  "sourceType": "gocbcore",
  "sourceName": "main",
  "sourceUUID": "43526ea7c789c895eea58b96fd1a9925",
  "planParams": {
    "maxPartitionsPerPIndex": 1024,
    "indexPartitions": 1
  },
  "params": {
    "doc_config": {
      "docid_prefix_delim": "",
      "docid_regexp": "",
      "mode": "scope.collection.type_field",
      "type_field": "type"
    },
    "mapping": {
      "analysis": {},
      "default_analyzer": "standard",
      "default_datetime_parser": "dateTimeOptional",
      "default_field": "_all",
      "default_mapping": {
        "dynamic": false,
        "enabled": false
      },
      "default_type": "_default",
      "docvalues_dynamic": false,
      "index_dynamic": false,
      "store_dynamic": false,
      "type_field": "_type",
      "types": {
        "_default.content": {
          "dynamic": false,
          "enabled": true,
          "properties": {
            "audio_description": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "analyzer": "en",
                  "include_in_all": true,
                  "include_term_vectors": true,
                  "index": true,
                  "name": "audio_description",
                  "store": true,
                  "type": "text"
                }
              ]
            },
            "image_description": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "analyzer": "en",
                  "include_in_all": true,
                  "include_term_vectors": true,
                  "index": true,
                  "name": "image_description",
                  "store": true,
                  "type": "text"
                }
              ]
            },
            "module_identifier": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "analyzer": "en",
                  "index": true,
                  "name": "module_identifier",
                  "store": true,
                  "type": "text"
                }
              ]
            },
            "text": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "analyzer": "en",
                  "include_in_all": true,
                  "include_term_vectors": true,
                  "index": true,
                  "name": "text",
                  "store": true,
                  "type": "text"
                }
              ]
            },
            "title": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "analyzer": "en",
                  "include_in_all": true,
                  "include_term_vectors": true,
                  "index": true,
                  "name": "title",
                  "store": true,
                  "type": "text"
                }
              ]
            },
            "type": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "analyzer": "en",
                  "index": true,
                  "name": "type",
                  "store": true,
                  "type": "text"
                }
              ]
            },
            "video_description": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "analyzer": "en",
                  "include_in_all": true,
                  "include_term_vectors": true,
                  "index": true,
                  "name": "video_description",
                  "store": true,
                  "type": "text"
                }
              ]
            }
          }
        }
      }
    },
    "store": {
      "indexType": "scorch",
      "segmentVersion": 15
    }
  },
  "sourceParams": null
}

I expect the following N1QL query to return atleast 50 results. It does in CouchbaseLite.

SELECT 
  item AS item,
  {
    "module_identifier": parent.module_identifier,
    "title": parent.title,
    "type": parent.type,
    "id": META(parent).id
  } AS parent
FROM 
  main._default.content AS item USE INDEX(USING FTS) 
  LEFT JOIN main._default.content AS parent ON ANY v IN parent.children SATISFIES v.id = SEARCH_META(item).id END 
WHERE 
  item.module_identifier="com.comp.platform.compliance" 
AND 
  SEARCH(item, {
    "query":{"match":"image"}
  })

Did removal of the outer join not change the outcome?

With LEFT JOIN I get parent type and child type results. That outcome changed and it was a success.

Even with this updated query, searching for the term image returns an empty list.

Sorry, I had missed the success part. Glad the outer join is working as expected.

How is your “en” analyser defined ? (Seems likely that the analysis is not including the term.)

The analyzer is keyword

@abhinav could you help here?

Using https://bleveanalysis.couchbase.com/analysis on the text “an image containing a fox”, the “en” analyser shows “imag”, “contain” & “fox” as the keywords (i.e. roots).

If you use the “standard” analyser you’ll get “image”, “containing” & “fox”. Using the standard analyser for your index will let you match the term “image” - this may be an interim workaround for you until @abhinav et. al. can provide some input.

You could also try to change the _default.content analyser to “en” too; in my experimentation that’ll allow matching using your query.

1 Like

Thanks for sharing that link. Very handy.

You are right. If I change every analyser to en, I get results for image. I get results if every analyzer is changed to standard as well. Prefix queries doesn’t work in either of them.

@abhinav’s comment here gave some clue on how things work.

To support wildcard queries from N1QL, fields have to be indexed using the keyword analyser. After converting the index to use keyword analyser on every field, I still don’t get expected results.

While running FTS search requests from N1QL

Does it mean that the FTS behavior is different while running from N1QL and running a query directly in the search UI? what’s the behavior when SDK is used?

Trying to gather everything that’s been discussed so far, I’ll state a few points here and my hope is they’ll help you understand the behavior you observe.

  • FTS supports 2 categories of queries - analytic (match, match_phrase) and non analytic (the rest of them) queries.

    • Analytic queries obtain the analyzer for the field to apply on the search criteria before running a search.
  • N1QL enforces a tighter contract with search as opposed to using using search directly via HTTP or SDK.

    • This means for non-analytic queries to obtain results the case will need to match which however the analyzer you’ve used tokenizes content. @dh already pointed out https://bleveanalysis.couchbase.com/analysis to you which you can use to test various analyzer behavior.
    • We had a restriction in place with 7.0+ that restricts the user of non-analytic queries (from N1QL) over fields that use keyword analyzers only. This should explain the behavior you see with prefix or wildcard queries.
    • In a future release (7.6), we’re dropping this constraint for more flexibility.
  • If you do not specify the field name in your query - the default_analyzer from your index definition is applied over your search criteria before running search.

"default_analyzer": "standard"
  • The difference between the en analyzer and the standard analyzer is with en we index the root of the term obtained using the english snowball stemmer. This should explain why you see tokens imag and image with different analyzers.

@PShri I’m not sure if I’ve answered all your questions here, so please let me know of any other concerns that haven’t been addressed.

@abhinav your explanation helped me understand many individual concepts, though I’m still working to piece them together holistically. A matrix comparing analyzers with examples would be great. I understand it’s a big ask, so no worries if it’s not feasible. Thanks for the in-depth response! I’ll reach out with any specific questions later. Currently, using the SDK for search with queries like ima* is serving my needs.