hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: cluster under-utilization with Hadoop Fair Scheduler
Date Sun, 11 Apr 2010 19:30:07 GMT
Hi Abhishek,

This behavior is improved by MAPREDUCE-706 I believe (not certain that
that's the JIRA, but I know it's fixed in trunk fairscheduler). These
patches are included in CDH3 (currently in beta)

In general, though, map tasks that are so short are not going to be very
efficient - even with fast assignment there is some constant overhead per


On Sun, Apr 11, 2010 at 11:42 AM, abhishek sharma <absharma@usc.edu> wrote:

> Hi all,
> I have been using the Hadoop Fair Scheduler for some experiments on a
> 100 node cluster with 2 map slots per node (hence, a total of 200 map
> slots).
> In one of my experiments, all the map tasks finish within a heartbeat
> interval of 3 seconds. I noticed that the maximum number of
> concurrently
> active map slots on my cluster never exceeds 100, and hence, the
> cluster utilization during my experiments never exceeds 50% even when
> large jobs with more than a 1000 maps are being executed.
> A look at the Fair Scheduler code (in particular, the assignTasks
> function) revealed the reason.
> As per my understanding, with the implementation in Hadoop 0.20.0, a
> TaskTracker is not assigned more than 1 map and 1 reduce task per
> heart beat.
> In my experiments, in every heart beat, each TT has 2 free map slots
> but is assigned only 1 map task, and hence, the utilization never goes
> beyond 50%.
> Of course, this (degenerate) case does not arise when map tasks take
> more than one 1 heart beat interval to finish. For example, I repeated
> the experiments with maps tasks taking close to 15 s to finish and
> noticed close to 100 % utilization when large jobs were executing.
> Why does the Fair Scheduler not assign more than one map task to a TT
> per heart beat? Is this done to spread the load uniformly across the
> cluster?
> I looked at assignTasks function in the default Hadoop scheduler
> (JobQueueTaskScheduler.java), and it does assign more than 1 map task
> per heart beat to a TT.
> It will be easy to change the Fair Scheduler to assign more than 1 map
> task to a TT per heart beat (I did that and achieved 100% utilization
> even with small map tasks). But I am wondering, if doing so will
> violate some fairness properties.
> Thanks,
> Abhishek

Todd Lipcon
Software Engineer, Cloudera

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message