Hadoop Connector - Binary Attachments

vzdavid · March 19, 2015, 3:58pm

I’ve used the Hadoop Connector to import data with Sqoop into HDFS. When I import a database of text documents, I can see the data came in as CSV. I also have a database that has text documents with image (jpg) attachments. When I sqoop that database into HDFS, the data comes through as a huge blob. The documentation does not explain what the hadoop connector is doing. Can someone explain what its doing to the non-text data? Is it just dumping a byte array as a field in the CSV?

I’m trying to figure out how to write a map-reduce job in Hadoop to analyze the data.

Thank you!

ingenthr · March 19, 2015, 5:46pm

I believe it’s mentioned in the Hadoop connector that we normalize to strings. That’s done with the toString() on the object. If you want to control that to get it into some kind of normalized format, you can probably wrap in a custom transcoder.

Are you looking to process the images in Hadoop?

Topic		Replies	Views
Importing data from couchbase to hadoop using connector Couchbase Server	1	1662	April 17, 2014
Couchbase Hadoop Sqoop Connector - Count Issues? Kafka Connector	4	1970	December 2, 2016
Couchebase connector for non CDH hadoop / spark Spark Connector	1	2706	October 17, 2013
Questions about couchbase connector for apache sqoop Couchbase Server	2	1875	May 12, 2015
Couchbase <=> HDFS Couchbase Server	6	2240	July 6, 2015

Hadoop Connector - Binary Attachments

Related topics