hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Something Something <mailinglist...@gmail.com>
Subject InputFormat related question...
Date Mon, 21 Dec 2009 17:42:49 GMT
In my application I have a file in this format:

The first line of the file contains the data to be processed, and *each* of
the remaining lines contain parameters that will be used to slice & dice the
data in various ways.  In other words, each mapper needs two lines - the 1st
line from this file that contains data and another line that contains
parameters.

I looked at NLineInputFormat which can be used for "parameter sweeps", but
it's not quite what I want.  I believe this format returns N no. of
consecutive lines to the mapper, correct?

What's the best way to handle this case?  Do I have to write a special
InputFormat class?  Please help.  Thanks.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message