hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: how to read binary data from hdfs
Date Tue, 01 May 2012 12:50:33 GMT
Amritanshu,

Implement your own custom InputFormat with a RecordReader and you can
read your files directly.

To learn how to implement custom readers/formats you can refer to an
example provided via sub-title "Processing a whole file as a record",
Page 206 | Chapter 7: MapReduce Types and Formats in Tom White's
Hadoop: The Definitive Guide, or you can read up the details on
http://developer.yahoo.com/hadoop/tutorial/module5.html#inputformat.

On Tue, May 1, 2012 at 3:42 PM, Amritanshu Shekhar
<amritanshu.shekhar@exponential.com> wrote:
> Hi Guys,
> I want to read binary data (produced by a C program) that is copied to HDFS using a java
program. The idea is that I would write a map-reduce job eventually  that would  use the
aforementioned programs output(the java program would read binary data and create a Java object
which the map function would use). I read about the sequence file format that hadoop supports
but converting the binary data using java serialization into sequence file format would add
another layer of complexity. Is there a simple no frills API  that I can use to read binary
data directly from HDFS. Any help/resources would be deeply appreciated.
> Thanks and Regards,
> Amritanshu



-- 
Harsh J

Mime
View raw message