Couchbase <=> HDFS

couchbwiss · July 4, 2015, 8:07am

Hello,

I would like to know if there is any “piplelined” system or product easy to make the data bridge between Couchbase (buckets) and Hadoop HDFS. The goal is to achieve the best in performance, easy connect between the two environments.
Please share your opiion.
Thanks,
W

daschl · July 6, 2015, 7:42am

@couchbwiss there are some ways to do it. You can use our Hadoop connector, go through kafka or even spark. See our connectors for more information: http://docs.couchbase.com/connectors/intro/index.html

If you can give us more context on your setup, that would help too!

couchbwiss · July 6, 2015, 8:19am

Thanks @daschl .
I have million of documents in a couch base bucket and I want to process the JSON in Spark CLuster (on the top of HDFS cluster). Large data sets analysis will be on Spark side.
Couchbase cluster and Spark cluster are in separate servers in the same DC.
My goal is to achieve best performance in loading data from couchbase to HDFS and vise versa. ideally a pipe lined solution to achieve a near real time and best performance.
1- Does the Spark connector does the pipelined solution?
2- The examples in the documentation are given only in Scala, is there is any other example to implement it in java so I can follow?

thanks
W

daschl · July 6, 2015, 8:37am

@couchbwiss you are in luck! Two days ago I released the beta version which includes a java API too… Please give us feedback and feel free to ask questions as you move along: Couchbase Spark Connector 1.0 Beta Release - The Couchbase Blog http://docs.couchbase.com/connectors/spark-1.0/spark-intro.html

Btw, yes the connector pushes and loads RDDs on demand from the workers, so you should get great performance out of it.

couchbwiss · July 6, 2015, 8:56am

Thanks man @daschl
I will try it and give you my feedback.
Can you send me in private you EMail address so I follow up with you the results of my tests?
Cheers
W

daschl · July 6, 2015, 9:19am

@couchbwiss sent you a private message here in the forums.

couchbwiss · July 6, 2015, 10:05am

received with big thanks @daschl

Topic		Replies	Views
Couchebase connector for non CDH hadoop / spark Spark Connector	1	2697	October 17, 2013
Any Couchbase 4.0 / Kafka / Hadoop Examples Couchbase Server	5	1893	December 9, 2015
Spark Connector from Python Spark Connector spark	2	3464	November 3, 2015
Spark Connector 2.0.0 Released Spark Connector announcement	1	2139	October 27, 2016
Presto Connector for Couchbase Couchbase Server connector	6	2152	January 12, 2022

Couchbase <=> HDFS

Related topics