hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Malcolm Matalka" <mmata...@millennialmedia.com>
Subject InputFormat not reading entire input file
Date Sun, 26 Oct 2008 05:42:15 GMT
I am trying to write my own input format.  I have basically been been using the TextInputFormat
as an example.  In this specific case I am trying to read in fixed length binary records into
a BytesWritable and IntWritable.  As far as I can tell, the actual reading of an individual
record works correctly, but it does not seem to be reading all of the records.  My input format
always returns false for isSplittable, I was under the impression that this would keep the
file from being split up during mapping but I don't think that is the case.

After a bunch of debugging here is what I am seeing:
A bunch of reads starting.
They all seem to EOF at the same point
No further reads

On small inputs everything works correctly.

What am I missing?  I did not see any documentation on how to write an input format.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message