hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ali Safdar Kureishy <safdar.kurei...@gmail.com>
Subject Reading data output by MapFileOutputFormat
Date Mon, 23 Apr 2012 10:53:21 GMT

If I use a *MapFileOutputFormat* to output some data, I see that each
reducer's output is a folder ("part-00000", for example), and inside that
folder are two files: "data" and "index".

However, there is no corresponding MapFileInputFormat, to read back this
folder ("part-00000"). Instead, *SequenceFileInputFormat* seems to read the
data. So, I have some questions:
- does SequenceFileInputFormat actually read *all* the data that was output
by MapFileOutputFormat? Or is some relationship data between the data and
index files lost in this process that would have been better handled by
another InputFormat class? In other words, is SequenceFileInputFormat the
right InputFormat to read data written by MapFileOutputFormat?
- how is it that SequenceFileInputFormat works to read outputs from
*both*MapFileOutputFormat and SequenceFileOutputFormat? That would
imply that
MapFileOutputFormat and SequenceFileOutputFormat output the same data, OR
that SequenceFileInputFormat internally handles both differently. What is
the reality?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message