What criteria(s) trigger the Full Text Search Index re-built?

The Global Secondary Index update is always async. When large number of documents are updated, the N1QL query will not be blocked, the query results will show eventual consistency and seem to lag depending upon index backlog.

Will the eventual consistency also apply to query against the FTS index? I have read the Couchbase documentations and it is not clear to me what/when will trigger the FTS index rebuilt. Does this FTS index get rebuilt whenever a single document is added/updated? or there is some kind of smart scheduling, like every 15 mins the FTS indexing service is going to wake up and check whether this is new document update? or whether the FTS indexing service is going to wait for at least 100 documents update then do the index rebuilt. Since our customers may upload lots of documents at once, we are worried whether this is going to significantly slow down the indexing of the FTS index or block the query against the FTS index or cause the full text search against the FTS not perform at all.

Thanks

Hey @jessyang, on new mutations the FTS index will continually be updated (it isn’t re-built).
When the index is building, search requests/queries will be allowed on data that has already been indexed.

Hi, @abhinav, does that mean each time one single document is added/updated/deleted will trigger the update of the the FTS index?

This part is partially correct. When querying, it is possible to specify ‘at plus’ or ‘request plus’ to have the query block until it reaches the point specified. See the documentation in whatever SDK you’re using for how to apply it.

It’s all event driven, not timer driven. So, when a document is propagated from the node responsible for it to an indexer and it’s processed and written to the index, every subsequent FTS will consider that doc. How long this takes depends on resources available. Assuming small documents and resources are available, it can happen in milliseconds.

Yes, whenever a document is added-updated-deleted in the couchbase-bucket , the mutation is streamed to the FTS index that is connected to it. The FTS index is updated when the mutation is received.

@ingenthr @abhinav Thank you so much for the information.

I have another question on the “scan consistency”. I know that we can set the following “scan consistency” flag (e.g. "Not_bounded’, ‘At_plus’ or ‘Request_plus’ ) for the N1QL query.

For the query against the FTS Index, can we also set the similar “scan consistency” flag (e.g. "Not_bounded’, ‘At_plus’ or ‘Request_plus’ ) ?

Based on the last comment, I would assume that the event is triggered by each individual document added/updated/deleted. (please correct me if I am wrong).

In our application’s scenario, our customer uses our application UI to upload hundreds of documents. Our application invokes the Couchbase API to insert the document one at a time. After each document is added to the bucket, depending on the resource, the FTS index may contain this document update after 1 or several milliseconds. I think this “Eventually Consistency” behaviour on FTS is very similar to what I have read on querying against the Global Secondary Index (https://docs.couchbase.com/server/6.0/learn/services-and-indexes/indexes/index-replication.html#index-consistency). In terms of Index consistency, I just want to confirm that the behavior is the same for both the Global Secondary Index and Full Text Search Index. Please let me know if you think these two types of indexes have difference in Index consistency.

Yes, the behavior is very similar and they use many of the same mechanisms differing only where appropriate to the kind of index being created. And I do believe (documentation confirms) that the FTS service has a similar at_plus consistency level. The SDK you’re using should allow that to be set and @abhinav can probably describe further if needed.

@jessyang NOT_BOUNDED and AT_PLUS consistency flags are supported by FTS currently. We intend to support REQUEST_PLUS as well, but I cannot give you an ETA for it at the moment.

Hi, @abhinav , Let’s say that a document (document A) has been updated in a bucket, a document mutation event is generated for this document. At this time, the nodes which run the Indexing service just suddenly crashed (e.g. power outage ). At this point, the Index has not been updated with the latest document changes. What’s going to happen when the nodes which runs the Indexing service is up again? Will the document mutation event for document be lost?

Hey @jessyang, no the mutation will not be lost.

When the indexing node goes down and comes back up, the connection to the source will be re-initiated. At this point the indexing node will let the source know of the last mutation (identified by a monotonically increasing mutation id) that was received and persisted. The source will start streaming items from that id, to ensure that the destination (in this case - the indexing node) is always on sync with the source.