hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcelo Elias Del Valle <mvall...@gmail.com>
Subject Re: number of mapper tasks
Date Mon, 28 Jan 2013 16:31:33 GMT
Hello Harsh,

    First of all, thanks for the answer!

2013/1/28 Harsh J <harsh@cloudera.com>
> So depending on your implementation of the job here, you may or may
> not see it act in effect. Hope this helps.

Is there anything I can do in my job, my code or in my inputFormat so that
hadoop would choose to run more mappers? My text file and 10 million lines
and each mapper task process 1 line at a time, very fastly. I would like to
have 40 threads in parallel or even more processing those lines.

> >     When I run my job with just 1 instance, I see it only creates 1
> mapper.
> > When I run my job with 5 instances (1 master and 4 cores), I can see
> only 2
> > mapper slots are used and 6 stay open.
> Perhaps the job itself launched with 2 total map tasks? You can check
> this on the JobTracker UI or whatever EMR offers as a job viewer.

I am trying to figure this out. Here is what I have from EMR:
I will try to get their support to understand this, but I didn't understand
what you said about the job being launched with 2 total map tasks... if I
have 8 slots, shouldn't all of them be filled always?

> This is a typical waiting reduce task log, what are you asking here
> specifically?

I have no reduce tasks. My map does the job without putting anything in the
output. Is it happening because reduce tasks receive nothing as input?

Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

View raw message