hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From biro lehel <lehel.b...@yahoo.com>
Subject Re: Set number of mappers by the number of input lines for a single file?
Date Sun, 20 May 2012 11:13:32 GMT
Sorry, my bad, I was looking at a previous job. The current job using NLineInputFormat is running
now distributed among all the slaves, and I think I got what I was looking for.

Sorry again, I hope I didn't create confusion.:)

Thank you for taking the time for you answers!

Cheers,
Lehel.

--- On Sun, 5/20/12, biro lehel <lehel.biro@yahoo.com> wrote:

From: biro lehel <lehel.biro@yahoo.com>
Subject: Re: Set number of mappers by the number of input lines for a single file?
To: common-user@hadoop.apache.org
Date: Sunday, May 20, 2012, 2:09 PM

Hello Harsh,

Meantime I figured out what was the problem (it was my bad, intermixing of the API's), however
I read somewhere that using it (from the old API) in 0.20.2 can cause problems. So I took
NLineInputFormat.java from the 2.0 branch and simply inserted it in my project, it all went
fine.

However, as I notice, although as many tasks are generated as the number of line in my input
file, the whole thing (the whole job) still gets executed on a single node (on a single slave)
- at least there is only one job showing up on my jobtracker, running on one of my slaves.
What I want is distribution in a way that for the very same (single) input file, all my running
slaves get involved and process (separately) the lines of this input file. I don't even have
a reduce phase at the moment, I only want to do the processing on the input, through the mapper.
Is the scenario I described achievable? How should I proceed?

Thank you,
Lehel.

--- On Sun, 5/20/12, Harsh J <harsh@cloudera.com> wrote:

From: Harsh J <harsh@cloudera.com>
Subject: Re: Set number of mappers by the number of input lines for a single file?
To: common-user@hadoop.apache.org
Date: Sunday, May 20, 2012, 1:54 PM

Biro,

0.20.2 did carry NLineInputFormat but in the older/stable (marked
deprecated, but was undeprecated subsequently) API package. See
http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html
which does confirm that 0.20.2 carried it. For 0.20.2, I recommend
sticking to the mapred.* API package.

For the new API (mapreduce.* package) version, you can also grab the
source and include it with the license into your project (and follow
whatever is required in doing so) from here:
http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1/src/mapred/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.java

Hope this helps.

On Sun, May 20, 2012 at 4:03 PM, biro lehel <lehel.biro@yahoo.com> wrote:
> Hello Harsh,
>
> Thanks for your answer. The problem is, that I'm using version 0.20.2, and, as I checked,
NLineInputFormat is not implemented here (at least I couldn't find it). Switching to an other
version would be kind of a big deal in my infrastructure, since I'm using VM's deployed form
images already pre-configured with 0.20.2, so it is not an option at the moment.  What should
I do?
>
> Thanks,
> Lehel.
>
> --- On Sun, 5/20/12, Harsh J <harsh@cloudera.com> wrote:
>
> From: Harsh J <harsh@cloudera.com>
> Subject: Re: Set number of mappers by the number of input lines for a single file?
> To: common-user@hadoop.apache.org
> Date: Sunday, May 20, 2012, 12:52 PM
>
> Lehel,
>
> You may use the NLineInputFormat with N=1:
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html
>
> On Sun, May 20, 2012 at 2:48 PM, biro lehel <lehel.biro@yahoo.com> wrote:
>> Dear all,
>>
>> I have one single input file, which contains, on every line, some hydrological calibration
models (data). Each line of the file should be processed and then the output from every line
written to another single output file.
>>
>> I understood that hadoop spawns mapper tasks with the same number as how many input
files there are (meaning, in my case, a single mapper would be generated). However, I want
that a mapper to be dealing with only a single line from my input file (nr. of mapper tasks
=  number of lines in my file).
>>
>> What is the best way to obtain such behavior? How should I specify this to Hadoop?
>>
>> Any suggestions are more than welcome.
>>
>> Thank you,
>> Lehel.
>
>
>
> --
> Harsh J



-- 
Harsh J

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message