hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From biro lehel <lehel.b...@yahoo.com>
Subject Re: Set number of mappers by the number of input lines for a single file?
Date Sun, 20 May 2012 10:33:33 GMT
Hello Harsh,

Thanks for your answer. The problem is, that I'm using version 0.20.2, and, as I checked,
NLineInputFormat is not implemented here (at least I couldn't find it). Switching to an other
version would be kind of a big deal in my infrastructure, since I'm using VM's deployed form
images already pre-configured with 0.20.2, so it is not an option at the moment.  What should
I do?

Thanks, 
Lehel.

--- On Sun, 5/20/12, Harsh J <harsh@cloudera.com> wrote:

From: Harsh J <harsh@cloudera.com>
Subject: Re: Set number of mappers by the number of input lines for a single file?
To: common-user@hadoop.apache.org
Date: Sunday, May 20, 2012, 12:52 PM

Lehel,

You may use the NLineInputFormat with N=1:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html

On Sun, May 20, 2012 at 2:48 PM, biro lehel <lehel.biro@yahoo.com> wrote:
> Dear all,
>
> I have one single input file, which contains, on every line, some hydrological calibration
models (data). Each line of the file should be processed and then the output from every line
written to another single output file.
>
> I understood that hadoop spawns mapper tasks with the same number as how many input files
there are (meaning, in my case, a single mapper would be generated). However, I want that
a mapper to be dealing with only a single line from my input file (nr. of mapper tasks = 
number of lines in my file).
>
> What is the best way to obtain such behavior? How should I specify this to Hadoop?
>
> Any suggestions are more than welcome.
>
> Thank you,
> Lehel.



-- 
Harsh J

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message