hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: cluster under-utilization with Hadoop Fair Scheduler
Date Sun, 11 Apr 2010 19:30:07 GMT
Hi Abhishek,

This behavior is improved by MAPREDUCE-706 I believe (not certain that
that's the JIRA, but I know it's fixed in trunk fairscheduler). These
patches are included in CDH3 (currently in beta)
http://archive.cloudera.com/cdh/3/

In general, though, map tasks that are so short are not going to be very
efficient - even with fast assignment there is some constant overhead per
task.

Thanks
-Todd

On Sun, Apr 11, 2010 at 11:42 AM, abhishek sharma <absharma@usc.edu> wrote:

> Hi all,
>
> I have been using the Hadoop Fair Scheduler for some experiments on a
> 100 node cluster with 2 map slots per node (hence, a total of 200 map
> slots).
>
> In one of my experiments, all the map tasks finish within a heartbeat
> interval of 3 seconds. I noticed that the maximum number of
> concurrently
> active map slots on my cluster never exceeds 100, and hence, the
> cluster utilization during my experiments never exceeds 50% even when
> large jobs with more than a 1000 maps are being executed.
>
> A look at the Fair Scheduler code (in particular, the assignTasks
> function) revealed the reason.
> As per my understanding, with the implementation in Hadoop 0.20.0, a
> TaskTracker is not assigned more than 1 map and 1 reduce task per
> heart beat.
>
> In my experiments, in every heart beat, each TT has 2 free map slots
> but is assigned only 1 map task, and hence, the utilization never goes
> beyond 50%.
>
> Of course, this (degenerate) case does not arise when map tasks take
> more than one 1 heart beat interval to finish. For example, I repeated
> the experiments with maps tasks taking close to 15 s to finish and
> noticed close to 100 % utilization when large jobs were executing.
>
> Why does the Fair Scheduler not assign more than one map task to a TT
> per heart beat? Is this done to spread the load uniformly across the
> cluster?
> I looked at assignTasks function in the default Hadoop scheduler
> (JobQueueTaskScheduler.java), and it does assign more than 1 map task
> per heart beat to a TT.
>
> It will be easy to change the Fair Scheduler to assign more than 1 map
> task to a TT per heart beat (I did that and achieved 100% utilization
> even with small map tasks). But I am wondering, if doing so will
> violate some fairness properties.
>
> Thanks,
> Abhishek
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message