Utilizing Couchbase Kafka Connector to Ingest Data

The premise is very simple: in the world of disparate technologies where one does not works or integrates well together, Couchbase & Confluent Kafka are amazing products and are extremely complementary to each other. Couchbase is linearly scalable, distributed NoSQL JSON database. Its primary use case is for any application/web service which requires single digit ms latency read/write/update response. It can be used as System of Record(SoR) or Caching layer for fast mutating transient data or offloading Db2/Oracle/SQL Server etc so that downstream services can consume data from Couchbase.

Confluent Kafka is full-fledged distributed streaming platform which is also linearly scalable and capable of handling trillions of events in a day. Confluent Platform makes it easy to build real-time data pipelines and streaming applications by integrating data from multiple sources and locations into a single, central Event Streaming Platform for your company.

In this blog post we will cover how seamlessly we can move data out Couchbase and push into a Confluent kafka topic as replication event.

Couchbase Kafka connector transfers documents from Couchbase efficiently and reliably using Couchbase’s internal replication protocol, DCP. Every change to or deletion of the document generates a replication event, which is then sent to the configured Kafka topic.

Kafka connector can be used to move data out of Couchbase and move data from kafka to Couchbase using sink connector. For this blog purpose we will be configuring and using source connector.

At high level the architecture diagram looks like below

Pre-requisites

Couchbase Cluster running version 5.X or above. Download Couchbase here

Couchbase kafka connector. Download Couchbase kafka connector here

Confluent Kafka. Download Confluent Kafka here

Configuring Couchbase cluster is outside of the scope of this blog post. However we will be discussing configuring Confluent kafka and Couchbase kafka connector to move data out of Couchbase.

Configuring Confluent Kafka

Untar the package downloaded above on a VM/pod. For the purpose of this blog, I have deployed an Ubuntu Pod in kubernetes cluster running on GKE.

$ tar -zxf confluent-5.2.1–2.12.tar.gz

1

$ tar -zxf confluent-5.2.1–2.12.tar.gz

Make sure before installing confluent kafka machines needs to have java 8 version.

root@kafkaconnector:/# java -version

1

root@kafkaconnector:/# java -version

openjdk version “1.8.0_222”

1

openjdk version “1.8.0_222”

Install/start kafka

$ cd confluent-5.2.1

1

$ cd confluent-5.2.1

$ export PATH=$PATH:~/confluent-5.2.1/bin $ confluent start

1
2

$ export PATH=$PATH:~/confluent-5.2.1/bin
$ confluent start

kafka has following processes, which all should be up.

<em class="markup--em markup--p-em">zookeeper is [UP]</em>

<em class="markup--em markup--p-em">kafka is [UP]</em>

<em class="markup--em markup--p-em">schema-registry is [UP]</em>

<em class="markup--em markup--p-em">kafka-rest is [UP]</em>

<em class="markup--em markup--p-em">connect is [UP]</em>

<em class="markup--em markup--p-em">ksql-server is [UP]</em>

<em class="markup--em markup--p-em">control-center is [UP]</em>

zookeeper is [UP]

kafka is [UP]

schema-registry is [UP]

kafka-rest is [UP]

connect is [UP]

ksql-server is [UP]

control-center is [UP]

Pod running confluent kafka can be exposed via NodePort service to the local machine/laptop. App pod file is here. Service yaml file is here

<em class="markup--em markup--p-em">$ kubectl get svc -n mynamespace</em>

<em class="markup--em markup--p-em">NAME          TYPE      CLUSTER-IP   EXTERNAL-IP PORT(S)                   AGE</em>

<em class="markup--em markup--p-em">app-service ClusterIP 10.51.248.154    &lt;none&gt;         9021/TCP,8083/TCP  115s</em>

$ kubectl get svc -n mynamespace

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE

app-service ClusterIP 10.51.248.154 <none> 9021/TCP,8083/TCP 115s

Port-forward the service on local port 9021

<em class="markup--em markup--p-em">$ kubectl port-forward service/app-service 9021:9021  -- namespace cbdb</em>

<em class="markup--em markup--p-em">Forwarding from 127.0.0.1:9021 -&gt; 9021</em>

<em class="markup--em markup--p-em">Forwarding from [::1]:9021 -&gt; 9021</em>

$ kubectl port-forward service/app-service 9021:9021 -- namespace cbdb

Forwarding from 127.0.0.1:9021 -> 9021

Forwarding from [::1]:9021 -> 9021

Hit the URL: http://localhost:9021

Configuring Couchbase Kafka Connector

Unzip the package downloaded from above

<em class="markup--em markup--p-em">$ unzip kafka-connect-couchbase-3.4.5.zip</em>

1	<em class="markup--em markup--p-em">$ unzip kafka-connect-couchbase-3.4.5.zip</em>

<em class="markup--em markup--p-em">$ cd kafka-connect-couchbase-3.4.5/</em>config

1	<em class="markup--em markup--p-em">$ cd kafka-connect-couchbase-3.4.5/</em>config

Edit file quickstart-couchbase-source.properties with (atleast) following information

Cluster Connection string

<em class="markup--em markup--p-em">connection.cluster_address=cb-demo-0000.cb-demo.default.svc.cluster.local</em>

1	<em class="markup--em markup--p-em">connection.cluster_address=cb-demo-0000.cb-demo.default.svc.cluster.local</em>

bucket name and bucket access credentials

<em class="markup--em markup--p-em">connection.bucket=travel-sample
connection.username=Administrator
connection.password=pa$$word</em>

connection.bucket=travel-sample

connection.username=Administrator

connection.password=pa$$word

Note: Enter credentials for bucket you want to move data too. In my example, I am using travel-sample bucket, with bucket user credentials.

Export the variable CONFLUENT_HOME

<em class="markup--em markup--p-em">export CONFLUENT_HOME=/root/confluent-5.2.1</em>

1	<em class="markup--em markup--p-em">export CONFLUENT_HOME=/root/confluent-5.2.1</em>

Start the kafka connector

<em class="markup--em markup--p-em">env CLASSPATH=./* connect-standalone $CONFLUENT_HOME/etc/schema-registry/connect-avro-standalone.properties config/quickstart-couchbase-source.properties</em>

1	<em class="markup--em markup--p-em">env CLASSPATH=./* connect-standalone $CONFLUENT_HOME/etc/schema-registry/connect-avro-standalone.properties config/quickstart-couchbase-source.properties</em>

When connector is started it created a kafka topic with the name cb-topic and we can see all documents from Couchbase travel-sample bucket get transferred to kafka topic cb-topic as events

Conclusion

In the matter of minutes one can integrate Couchbase and Confluent Kafka. Ease of use, deployment and supportability are key factors in using technology. In this blog post we saw that one can seamlessly move data out of Couchbase into a kafka topic. Once data is in kafka topic, then using KSQL one can create real-time stream processing applications matching business needs.

References:

Ram Dhakne

Share this article

Platform

Self-Managed

Services

Capabilities

Why Couchbase?

Migrate to Capella

By Use Case

By Industry

By Application Need

Popular Docs

By Developer Role

Quickstart

Resource Center

About

Partnerships

Our Services

Partners: Register a Deal

Ready to register a deal with Couchbase?

Marriott

Utilizing Couchbase Kafka Connector to Ingest Data

Pre-requisites

Configuring Confluent Kafka

Install/start kafka

Conclusion

References:

Get Couchbase blog updates in your inbox

Author

Posted by Ram Dhakne

Leave a comment Cancel reply

Ready to get Started with Couchbase Capella?

Start building

Use Capella free

Get in touch