hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: MAX_MAPS_AT_ONCE in ShuffleScheduler class
Date Sun, 19 Jun 2011 07:24:58 GMT
Shrinivas,

Yes your understanding is correct here.

>From what I understand, previously (0.20.2-) only one file was gotten
per connection made at the TT.

With the MAPREDUCE-318 change, the shuffle code underwent changes to
pull in bulk of map output files at a time (per connection and hard
limited to 20 outputs as you noticed). I've posted your comment on the
same jira so you may get a resolution since I now have the same
question.

(Basically, I think its to prevent overloading the TT a bit by keeping
the connection open too long for lots of files?)

On Sun, Jun 19, 2011 at 4:01 AM, Shrinivas Joshi <jshrinivas@gmail.com> wrote:
> We see following type of lines in our reducer log files. Based on my
> understanding it looks like the target map host has 53 map outputs that are
> ready to be fetched. The shuffle scheduler seems to be allowing only 20 of
> them to be fetched at a time. This is controlled by MAX_MAPS_AT_ONCE
> variable in ShuffleScheduler class. Is my understanding of this log output
> correct? If so, why is MAX_MAPS_AT_ONCE set to 20?
>
> Thanks for your time.
>
> -Shrinivas
>
> INFO org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:
> Assiging hostname:50060 with 53 to fetcher#16
> INFO org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:
> assigned 20 of 53 to hostname:50060 to fetcher#16
>



-- 
Harsh J

Mime
View raw message