hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joey Echeverria <j...@cloudera.com>
Subject Re: Regarding FIFO scheduler
Date Thu, 22 Sep 2011 13:43:22 GMT
In most cases, your job will have more map tasks than map slots. You
want the reducers to spin up at some point before all your maps
complete, so that the shuffle and sort can work in parallel with some
of your map tasks. I usually set slow start to 80%, sometimes higher
if I know the maps are slow and they do a lot of filtering, so there
isn't too much intermediate data.

-Joey

On Thu, Sep 22, 2011 at 6:38 AM, Praveen Sripati
<praveensripati@gmail.com> wrote:
> Joey,
>
> Thanks for the response.
>
> 'mapreduce.job.reduce.slowstart.completedmaps' is default set to 0.05 and
> says 'Fraction of the number of maps in the job which should be complete
> before reduces are scheduled for the job.'
>
> Shouldn't the map tasks be completed before the reduce tasks are kicked for
> a particular job?
>
> Praveen
>
> On Thu, Sep 22, 2011 at 6:53 PM, Joey Echeverria <joey@cloudera.com> wrote:
>>
>> The jobs would run in parallel since J1 doesn't use all of your map
>> tasks. Things get more interesting with reduce slots. If J1 is an
>> overall slower job, and you haven't configured
>> mapred.reduce.slowstart.completed.maps, then J1 could launch a bunch
>> of idle reduce tasks which would starve J2.
>>
>> In general, it's best to configure the slow start property and to use
>> the fair scheduler or capacity scheduler.
>>
>> -Joey
>>
>> On Thu, Sep 22, 2011 at 6:05 AM, Praveen Sripati
>> <praveensripati@gmail.com> wrote:
>> > Hi,
>> >
>> > Lets assume that there are two jobs J1 (100 map tasks) and J2 (200 map
>> > tasks) and the cluster has a capacity of 150 map tasks (15 nodes with 10
>> > map
>> > tasks per node) and Hadoop is using the default FIFO scheduler. If I
>> > submit
>> > first J1 and then J2, will the jobs run in parallel or the job J1 has to
>> > be
>> > completed before the job J2 starts.
>> >
>> > I was reading 'Hadoop - The Definitive Guide'  and it says "Early
>> > versions
>> > of Hadoop had a very simple approach to scheduling users’ jobs: they ran
>> > in
>> > order of submission, using a FIFO scheduler. Typically, each job would
>> > use
>> > the whole cluster, so jobs had to wait their turn."
>> >
>> > Thanks,
>> > Praveen
>> >
>>
>>
>>
>> --
>> Joseph Echeverria
>> Cloudera, Inc.
>> 443.305.9434
>
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Mime
View raw message