hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@yahoo-inc.com>
Subject Re: Mapreduce: scheduling and task placement work
Date Tue, 16 Nov 2010 17:51:08 GMT


  There isn't a concerted effort afaik, given the complexity of the  
task at hand - as I'm sure you will appreciate.

  There are several pieces slowly falling in place:
  # The CapacityScheduler (CS) already allows for memory-based  
scheduling (i.e. support for 'High RAM' jobs). This is a subtle  
change, rather than look at abstract 'slots' the CS looks at every  
machine as being made up of real memory slots.
  # The TaskTracker in trunk (hadoop-0.22) already reports per task  
CPU, memory usage.

  Clearly it will take some effort to move from here to truly dynamic  
slot-less scheduling, since the notion of slots is fairly deeply  
entrenched in the framework (JobTracker, TaskTracker etc.).

  Of course, I don't mean to discourage you!

  Feel free to start opening jiras and jotting your thoughts down,  
make some proposals and get involved!


On Nov 16, 2010, at 4:51 AM, abhishek sharma wrote:

> Hi,
> In his e-mail on the Hadoop Common mailing list, Steve Loughran
> mentioned the following:
> "There's work underway to be more aware of system load when scheduling
> things, rather than have a fairly simplistic "slot" model, look more
> at system load and memory load as a way of measuring how idle machines
> are. If you were to be really devious, you'd look at io load, network,
> machine temperature, etc. If you find this an interesting problem to
> get involved in, the mapreduce-dev mailing list is the place to get
> involved."
> I would like to get involved.
> I recently finished my PhD in computer science from the Univ. of
> Southern California. In one of my projects, I modified the MapReduce
> scheduler to implement a particular job priority scheme.
> Thanks,
> Abhishek

View raw message