hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amareshwari Sri Ramadasu <amar...@yahoo-inc.com>
Subject Re: InputFormat related question...
Date Tue, 22 Dec 2009 04:31:52 GMT
Hi,

If you want map task to process two lines at a time, you need to write a RecordReader which
constructs two lines per record. LineRecordReader makes one line as one record.
You can extend NLineInputFormat for generating splits and return your new RecordReader for
reading records from split.
Hope this helps you.

Thanks
Amareshwari

On 12/21/09 11:12 PM, "Something Something" <mailinglists19@gmail.com> wrote:

In my application I have a file in this format:

The first line of the file contains the data to be processed, and *each* of
the remaining lines contain parameters that will be used to slice & dice the
data in various ways.  In other words, each mapper needs two lines - the 1st
line from this file that contains data and another line that contains
parameters.

I looked at NLineInputFormat which can be used for "parameter sweeps", but
it's not quite what I want.  I believe this format returns N no. of
consecutive lines to the mapper, correct?

What's the best way to handle this case?  Do I have to write a special
InputFormat class?  Please help.  Thanks.



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message