I’ve used the Hadoop Connector to import data with Sqoop into HDFS. When I import a database of text documents, I can see the data came in as CSV. I also have a database that has text documents with image (jpg) attachments. When I sqoop that database into HDFS, the data comes through as a huge blob. The documentation does not explain what the hadoop connector is doing. Can someone explain what its doing to the non-text data? Is it just dumping a byte array as a field in the CSV?
I’m trying to figure out how to write a map-reduce job in Hadoop to analyze the data.
Thank you!