Getting only partial result from FTS on 2 node cluster

Hi,

We are thinking about replace Elastic Search with Couchbase FTS to reduce some complexity in our setup. So I’ve setup a test cluster with 2 Couchbase 5.5 nodes, both with data and search service.

I created then a bucket, filled the bucket with 200’000 test items and created a simple fts index. “Indexing progress” stopped at about 50%, and querying in the GUI gives a red message “Partial [50%]” above the results. I tried the to query with the golang library, and also there I get only the half of the result. After testing various index setting etc I logged in with the GUI to the second node. Everything looked similar there, bucket is there, correctly balanced and also the FTS index was created there. But when I search, I get the other half of the results!

I removed then search service on one of the nodes (but still have data service on both), now fts indexes all items correctly and progress shows 100%. So thats good enough for my test scenarios. But for going in production I really need 2 (or more) fts services for redudancy.

Do I something wrong? It looks like everything is correctly balanced, bucket and fts. But query only queries on one node instead of both. O f course I could manually call both services and merge the result in my program, but looks not like a good solution

Thanks,
Pascal

@gizmo74 which index format did you use ? (the default one ore the experimental scorch one?)

Hi Pascal,

“Indexing progress” stopped at about 50% clearly shows that for some reason, indexing didn’t happen fully on either of the nodes. So that explains the partial results with querying,

Please try to add back the FTS enabled node and check the rebalance progress. Indexing or searching on a multi node cluster is supposed to work. And if the problem still persists, you may raise a support ticket.

cheers,

I tried both, Moss and Scorch, same result for both :-o

I tried disabling/enabling node and rebalancing many times this weekend, and always the same result: half of the index is on node1, the other half on node2. But that part looks correct for me so far. I think for whatever reason node 1 don’t know that the other half exists on node2 and vice versa.

Tried to add back fts on node 2. Same result. Now I have:
Indexing progress 50.18% (100’049 items) on node1, 49.82% (99’351 items) on node2.
Together 199400 items, exactly the number of items in the bucket. Of course still not possible to query fts, get half of result if I query node1, the other half if I query node2.

Does this help?

2018-08-20T15:32:01.543+02:00 [INFO] stats: {
“batch_bytes_added”: 823415873,
“batch_bytes_removed”: 823415873,
“fts:ftstest:avg_queries_latency”: 11.493515,
“fts:ftstest:batch_merge_count”: 0,
“fts:ftstest:doc_count”: 199400,
“fts:ftstest:iterator_next_count”: 0,
“fts:ftstest:iterator_seek_count”: 0,
“fts:ftstest:num_bytes_live_data”: 0,
“fts:ftstest:num_bytes_used_disk”: 8039399819,
“fts:ftstest:num_files_on_disk”: 0,
“fts:ftstest:num_mutations_to_index”: 0,
“fts:ftstest:num_pindexes”: 6,
“fts:ftstest:num_pindexes_actual”: 6,
“fts:ftstest:num_pindexes_target”: 6,
“fts:ftstest:num_recs_to_persist”: 0,
“fts:ftstest:reader_get_count”: 0,
“fts:ftstest:reader_multi_get_count”: 0,
“fts:ftstest:reader_prefix_iterator_count”: 0,
“fts:ftstest:reader_range_iterator_count”: 0,
“fts:ftstest:timer_batch_store_count”: 0,
“fts:ftstest:timer_data_delete_count”: 0,
“fts:ftstest:timer_data_update_count”: 199400,
“fts:ftstest:timer_opaque_get_count”: 3072,
“fts:ftstest:timer_opaque_set_count”: 2048,
“fts:ftstest:timer_rollback_count”: 0,
“fts:ftstest:timer_snapshot_start_count”: 1024,
“fts:ftstest:total_bytes_indexed”: 1029966474,
“fts:ftstest:total_bytes_query_results”: 15088,
“fts:ftstest:total_compaction_written_bytes”: 20233294933,
“fts:ftstest:total_compactions”: 0,
“fts:ftstest:total_queries”: 1,
“fts:ftstest:total_queries_error”: 0,
“fts:ftstest:total_queries_slow”: 0,
“fts:ftstest:total_queries_timeout”: 0,
“fts:ftstest:total_request_time”: 11500669,
“fts:ftstest:total_term_searchers”: 3,
“fts:ftstest:writer_execute_batch_count”: 0,
“num_bytes_used_ram”: 310270272,
“pct_cpu_gc”: 0.013999333392995443,
“tot_http_limitlisteners_closed”: 0,
“tot_http_limitlisteners_opened”: 1,
“tot_https_limitlisteners_closed”: 0,
“tot_https_limitlisteners_opened”: 1,
“tot_remote_http”: 0,
“tot_remote_http2”: 1179,
“total_gc”: 205
}
2018-08-20T15:32:23.329+02:00 [INFO] cbdatasource: server: 151.252.8.77:11210, uprOpenName: fts:ftstest_7c7f28e0ea245d3b_54820232-15c665e, worker, looping beg, vbucketState: “running” (has 1 vbuckets), 512
2018-08-20T15:32:23.330+02:00 [INFO] cbdatasource: server: 151.252.8.77:11210, uprOpenName: fts:ftstest_7c7f28e0ea245d3b_6ddbfb54-2d033a2b, worker, looping beg, vbucketState: “running” (has 169 vbuckets), 855-1023
2018-08-20T15:32:23.329+02:00 [INFO] cbdatasource: server: 151.252.8.76:11210, uprOpenName: fts:ftstest_7c7f28e0ea245d3b_54820232-15c665e, worker, looping beg, vbucketState: “running” (has 170 vbuckets), 342-511
2018-08-20T15:32:23.330+02:00 [INFO] cbdatasource: server: 151.252.8.77:11210, uprOpenName: fts:ftstest_7c7f28e0ea245d3b_aa574717-2dbd2106, worker, looping beg, vbucketState: “running” (has 171 vbuckets), 513-683
2018-08-20T15:32:23.330+02:00 [INFO] cbdatasource: server: 151.252.8.77:11210, uprOpenName: fts:ftstest_7c7f28e0ea245d3b_18572d87-634396b5, worker, looping beg, vbucketState: “running” (has 171 vbuckets), 684-854
2018-08-20T15:32:23.413+02:00 [INFO] cbdatasource: server: 151.252.8.76:11210, uprOpenName: fts:ftstest_7c7f28e0ea245d3b_13aa53f3-7f1c89e0, worker, looping beg, vbucketState: “running” (has 171 vbuckets), 0-170
2018-08-20T15:32:23.433+02:00 [INFO] cbdatasource: server: 151.252.8.76:11210, uprOpenName: fts:ftstest_7c7f28e0ea245d3b_f4e0a48a-1153932d, worker, looping beg, vbucketState: “running” (has 171 vbuckets), 171-341
2018-08-20T15:32:53.330+02:00 [INFO] cbdatasource: server: 151.252.8.76:11210, uprOpenName: fts:ftstest_7c7f28e0ea245d3b_54820232-15c665e, worker, looping beg, vbucketState: “running” (has 170 vbuckets), 342-511
2018-08-20T15:32:53.330+02:00 [INFO] cbdatasource: server: 151.252.8.77:11210, uprOpenName: fts:ftstest_7c7f28e0ea245d3b_54820232-15c665e, worker, looping beg, vbucketState: “running” (has 1 vbuckets), 512
2018-08-20T15:32:53.330+02:00 [INFO] cbdatasource: server: 151.252.8.77:11210, uprOpenName: fts:ftstest_7c7f28e0ea245d3b_18572d87-634396b5, worker, looping beg, vbucketState: “running” (has 171 vbuckets), 684-854
2018-08-20T15:32:53.331+02:00 [INFO] cbdatasource: server: 151.252.8.77:11210, uprOpenName: fts:ftstest_7c7f28e0ea245d3b_6ddbfb54-2d033a2b, worker, looping beg, vbucketState: “running” (has 169 vbuckets), 855-1023
2018-08-20T15:32:53.331+02:00 [INFO] cbdatasource: server: 151.252.8.77:11210, uprOpenName: fts:ftstest_7c7f28e0ea245d3b_aa574717-2dbd2106, worker, looping beg, vbucketState: “running” (has 171 vbuckets), 513-683
2018-08-20T15:32:53.413+02:00 [INFO] cbdatasource: server: 151.252.8.76:11210, uprOpenName: fts:ftstest_7c7f28e0ea245d3b_13aa53f3-7f1c89e0, worker, looping beg, vbucketState: “running” (has 171 vbuckets), 0-170
2018-08-20T15:32:53.434+02:00 [INFO] cbdatasource: server: 151.252.8.76:11210, uprOpenName: fts:ftstest_7c7f28e0ea245d3b_f4e0a48a-1153932d, worker, looping beg, vbucketState: “running” (has 171 vbuckets), 171-341
2018-08-20T15:33:23.331+02:00 [INFO] cbdatasource: server: 151.252.8.77:11210, uprOpenName: fts:ftstest_7c7f28e0ea245d3b_54820232-15c665e, worker, looping beg, vbucketState: “running” (has 1 vbuckets), 512
2018-08-20T15:33:23.331+02:00 [INFO] cbdatasource: server: 151.252.8.76:11210, uprOpenName: fts:ftstest_7c7f28e0ea245d3b_54820232-15c665e, worker, looping beg, vbucketState: “running” (has 170 vbuckets), 342-511
2018-08-20T15:33:23.331+02:00 [INFO] cbdatasource: server: 151.252.8.77:11210, uprOpenName: fts:ftstest_7c7f28e0ea245d3b_18572d87-634396b5, worker, looping beg, vbucketState: “running” (has 171 vbuckets), 684-854
2018-08-20T15:33:23.331+02:00 [INFO] cbdatasource: server: 151.252.8.77:11210, uprOpenName: fts:ftstest_7c7f28e0ea245d3b_6ddbfb54-2d033a2b, worker, looping beg, vbucketState: “running” (has 169 vbuckets), 855-1023
2018-08-20T15:33:23.331+02:00 [INFO] cbdatasource: server: 151.252.8.77:11210, uprOpenName: fts:ftstest_7c7f28e0ea245d3b_aa574717-2dbd2106, worker, looping beg, vbucketState: “running” (has 171 vbuckets), 513-683
2018-08-20T15:33:23.414+02:00 [INFO] cbdatasource: server: 151.252.8.76:11210, uprOpenName: fts:ftstest_7c7f28e0ea245d3b_13aa53f3-7f1c89e0, worker, looping beg, vbucketState: “running” (has 171 vbuckets), 0-170
2018-08-20T15:33:23.434+02:00 [INFO] cbdatasource: server: 151.252.8.76:11210, uprOpenName: fts:ftstest_7c7f28e0ea245d3b_f4e0a48a-1153932d, worker, looping beg, vbucketState: “running” (has 171 vbuckets), 171-341

I think I found the issue: Port 18094 needs to be open for node-to-node communication, so documentation is not correct at
https://developer.couchbase.com/documentation/server/current/install/install-ports.html

@gizmo74 thanks! I filed a doc bug.: Loading...

edit: note that the table is correct, but it seems to be missing from the grouping before in the text.