N1QL query with many SEARCH(FTS) - concat results

maciej.tomaszewski · September 11, 2020, 5:18pm

Hi! I’m trying to figure out how to do the N1QL query that would use multiple SEARCH predicates. Currently it’s returning an empty set of data, while there are some results in both queries (when I run them separately). Any idea how to effectively join it? I would also like to use scoring so it needs to be sorted DESC by score.

 SELECT 
   SEARCH_META(s1) AS meta1, 
   SEARCH_META(s2) AS meta2
 FROM `bucket` i 
WHERE SEARCH(i, {"query": "sap* ras* irnq0oamyl*"}, {"index": "ind1", "out": "s1"}) 
AND SEARCH(i, {"query": "krnlpzfmyl*"}, {"index": "ind2", "out": "s2"})

If I run those SEARCH queries separately then I get results:

[
    {
        "meta1": {
            "id": "796aee75-fc25-4723-bcd9-fa21be52df5c",
            "score": 0.17407765752383322
        }
    },
    {
        "meta1": {
            "id": "b7abd3cc-a049-4c42-a0f4-0d22e33b475c",
            "score": 0.0647523879789343
        }
    },
    {
        "meta1": {
            "id": "bb55c912-045d-467e-9aaa-6457481179bb",
            "score": 0.052859253129068286
        }
    },
    {
        "meta1": {
            "id": "ff25fee1-cb48-4887-aa14-eb13b7289b69",
            "score": 0.230408143187017
        }
    }
    ]

And

 	[
    {
        "meta2": {
            "id": "b7abd3cc-a049-4c42-a0f4-0d22e33b475c",
            "score": 0.14433756470680237
        }
    },
    {
        "meta2": {
            "id": "bb55c912-045d-467e-9aaa-6457481179bb",
            "score": 0.3456122335610931
        }
    }
    ]

Using Couchbase Server v6.6

vsr1 · September 11, 2020, 5:32pm

The query doing AND that means per same document both conditions must be true.

NOTE: Each SEARCH() does separately and do Intersection on the document key. SERCH_SCORE(), SEARCH_META() mostly available when single index is used. In case of Intersect those might not available. In that case it returns MISSING.

Project META(i).id you should see results

You can use single index(search function) and do conjunction/disjunction in side search. cc @abhinav

abhinav · September 11, 2020, 5:41pm

Yes I’d recommend using a single FTS index as opposed to the two you have: ind1, ind2.
If you can share both your index definitions, I’d be able to recommend a unified definition and the SEARCH(…) query to best suit your scoring needs.

maciej.tomaszewski · September 11, 2020, 5:46pm

Thanks for an instant answer!
So my indexes are set on the same bucket. Nothing fancy, each index on a different field. That’s it. See screenshots below.

vsr1 · September 11, 2020, 5:51pm

In addition could you also add output of this in your request want to see what do u mean empty set of data. Is some fields missing or documents missing.

SELECT 
   MTEA(i).id,
   SEARCH_META(s1) AS meta1, 
   SEARCH_META(s2) AS meta2
 FROM `bucket` i 
WHERE SEARCH(i, {"query": "sap* ras* irnq0oamyl*"}, {"index": "ind1", "out": "s1"}) 
AND SEARCH(i, {"query": "krnlpzfmyl*"}, {"index": "ind2", "out": "s2"});

maciej.tomaszewski · September 11, 2020, 5:54pm

Documents are missing. Empty array.

{
  "results": []
}

abhinav · September 11, 2020, 6:00pm

Ok, I’d recommend creating a single index, with child fields field1 and field2 within the default mapping like you have. lets call this index “ind”. Also since you’re essentially doing wildcard searches - I’d recommend using the “keyword” analyzer for both fields.

Here’s your new query now …

 SELECT SEARCH_META() ,
 FROM `bucket` i 
WHERE SEARCH(i, {
	"query": {
		"conjuncts": [{
			"disjuncts": [{
				"field": "field1",
				"wildcard": "sap*"
			}, {
				"field": "field1",
				"wildcard": "ras*"
			}, {
				"field": "field1",
				"wildcard": "irnq0oamyl*"
			}]
		}, {
			"field": "field2",
			"wildcard": "krnlpzfmyl*"
		}]
	}
});

vsr1 · September 11, 2020, 6:00pm

Hum. It should have the following two documents .

        "id": "b7abd3cc-a049-4c42-a0f4-0d22e33b475c",
        "id": "bb55c912-045d-467e-9aaa-6457481179bb",

Try the following see if you get expected documents.

SELECT META().id, *
 FROM `bucket` i 
WHERE SEARCH(i, {"query": "sap* ras* irnq0oamyl*"}, {"index": "ind1", "out": "s1"}) ;

SELECT META().id, *
 FROM `bucket` i 
WHERE  SEARCH(i, {"query": "krnlpzfmyl*"}, {"index": "ind2", "out": "s2"});

maciej.tomaszewski · September 11, 2020, 6:13pm

Alright, so it’s not that simple comparing to a standard FTS query. But anyway I got the point now!
By the way:

why would you recommend keyword analyzer as a default one?
can I add “fuzziness”: “2” to this query or should I use direct match instead of wildcards?
how about the performance? is there any difference comparing to a pure FTS query?

abhinav · September 11, 2020, 6:20pm

A wildcard query is a non-analytic query. Meaning it does not apply any analysis to the search terms. Using a keyword analyzer to index your data will essentially preserve the content as is so the search terms from non-analytic queries can be found.

“fuzziness”: 2 can be added to match queries not “wildcards”. The performance of the query will directly depend on the number of tokens your search criteria will match. So I’d advise you to design your query as narrow and precise as possible for better performance.

FTS Queries via N1QL will go through an additional hop of N1QL getting results from FTS and then delivering them to you, so you may notice (very slightly) higher latencies as opposed to directly running FTS queries. The advantages of using N1QL for your FTS queries are multi-fold - you will be able to combine results from multiple indexes (FTS or GSI) for one.

vsr1 · September 11, 2020, 7:33pm

@maciej.tomaszewski,

{“query”: “sap* ras* irnq0oamyl*”} is not a valid query for the SEARCH function. It can be either of these …

SELECT 
   SEARCH_META(s1) AS meta1, 
   SEARCH_META(s2) AS meta2
 FROM `bucket` i 
WHERE SEARCH(i, "sap* ras* irnq0oamyl*", {"index": "ind1", "out": "s1"}) 
AND SEARCH(i,  "krnlpzfmyl*", {"index": "ind2", "out": "s2"});

OR

SELECT 
   SEARCH_META(s1) AS meta1, 
   SEARCH_META(s2) AS meta2
 FROM `bucket` i 
WHERE SEARCH(i, {"query":{"query":"sap* ras* irnq0oamyl*"}}, {"index": "ind1", "out": "s1"}) 
AND SEARCH(i,  {"query":{"query":"krnlpzfmyl*"}}, {"index": "ind2", "out": "s2"});

maciej.tomaszewski · September 11, 2020, 7:37pm

Oh… funny think but you are right! It worked when I dropped "query" part for the 2nd argument. Now it works as expected but I think I’m gonna switch to what @abhinav proposed. It looks like a more advanced solution.

vsr1 · September 11, 2020, 7:39pm

Agree it. I was more looking why it gave empty result.

maciej.tomaszewski · September 11, 2020, 7:42pm

You got my respect for that! Really good job in debugging my problem
Based on what @abhinav wrote ...N1QL for your FTS queries are multi-fold - you will be able to combine results from multiple indexes (FTS or GSI) for one. I guess there is also another adventage - using WHERE conditions which are not available in FTS itself. Like narrowing down the number of indexed documents to only those with let’s say “company_id” = 1 and “country” = “us” (while all of them exists in the same index). Am I right or is there another way to achieve that?

vsr1 · September 11, 2020, 7:49pm

My advice is if you can achieve directly FTS index (searching from the SDKs, REST API), go with it.

N1QL it has other advantages , If you want Join, aggregation etc… i.e Instead of GSI leverage FTS index do complete SQL.

The following articles gives details when to use N1QL GSI, N1QL FTS

Topic		Replies	Views
Combining FTS Search with Parent-Child Document Retrieval in N1QL SQL++ query , n1ql , index	20	649	February 8, 2024
N1QL + FTS slow SQL++	6	424	July 16, 2023
Is there any docs or samples how to Integrate a Geo FTS Search and N1QL Full Text Search n1ql	16	1017	March 16, 2021
FTS over multiple collections / N1QL JOINing multiple collections that match given SEARCH Full Text Search	14	1463	January 24, 2023
Full text search and JOIN Full Text Search	2	2026	December 26, 2018

N1QL query with many SEARCH(FTS) - concat results

Related topics