Hello everyone!
I guess there’s something I must be doing wrong, but my cbes connector keeps creating duplicate documents in Elasticsearch on every document mutation in Couchbase.
It looks like the connector is not trying to upsert the document, but uses a new Elasticsearch document id.
This is seen in the logDocumentLifecycle logs:
13:27:25.142 [dcp-io-7-1] INFO c.c.c.e.DocumentLifecycle - {"milestone":"RECEIVED_FROM_COUCHBASE","tracingToken":30750,"documentId":"_default._default.client-unit:33009682:646:15146049:BIOLITE","revision":2,"type":"mutation","partition":609,"sequenceNumber":29251,"assignedToWorker":1,"usSinceCouchbaseChange(might be inaccurate before Couchbase 7)":78833,"usSinceReceipt":102}
13:27:25.142 [es-worker-1] INFO c.c.c.e.DocumentLifecycle - {"milestone":"MATCHED_TYPE_RULE","tracingToken":30750,"documentId":"_default._default.client-unit:33009682:646:15146049:BIOLITE","elasticsearchIndex":"shs-client-units","typeConfig":"TypeConfig{index=shs-client-units, pipeline=cbes-filter, ignore=false, ignoreDeletes=false, matchOnQualifiedKey=false, matcher=prefix='client-unit'; qualifiedKey=false}","usSinceReceipt":609}
13:27:25.142 [es-worker-1] INFO c.c.c.e.DocumentLifecycle - {"milestone":"ELASTICSEARCH_WRITE_STARTED","tracingToken":30750,"documentId":"_default._default.client-unit:33009682:646:15146049:BIOLITE","attempt":1,"usSinceReceipt":976}
13:27:25.160 [es-worker-1] INFO c.c.c.e.DocumentLifecycle - {"milestone":"ELASTICSEARCH_WRITE_SUCCEEDED","tracingToken":30750,"documentId":"_default._default.client-unit:33009682:646:15146049:BIOLITE","usSinceReceipt":18451}
It’s clearly able to tell that the update is a mutation from Couchbase, but in ES I can see that a brand new doc was created (with the same metadata.id
but a new metadata.revSeqno
coming from Couchbase), with a different Elasticsearch _id
.
Naturally in Couchbase there’s only one such doc.
The basic config I’m using is:
[elasticsearch.docStructure]
metadataFieldName = 'metadata'
documentContentAtTopLevel = true
wrapCounters = false
[elasticsearch.typeDefaults]
index = ''
pipeline = 'cbes-filter'
typeName = '_doc'
ignore = true
ignoreDeletes = false
[[elasticsearch.type]]
prefix = 'client-unit'
index = 'shs-client-units'
ignore = false
ignoreDeletes = false
I’m using latest 4.4.2 from the official docker image.
Is there anything I’m suppose to do to to make the connector upsert into Elasticsearch and avoid creating duplicates? How can I make it use the same Couchbase document id
in Elasticsearch?
Many thanks!