hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hemanth Yamijala <yhema...@gmail.com>
Subject Re: Task scheduler
Date Fri, 14 May 2010 04:19:10 GMT

>  let me re frame my question I wanted to knowhow job tracker decides the
> assignment of input splits to task tracker based on task tracker's data
> locality. Where is this policy defined? Is it pluggable?

Sorry, I misunderstood your question then. This code is in
o.a.h.mapred.JobInProgress. It is likely spread across many methods in
the class. But a good starting point could be from methods like
obtainNewMapTask or obtainNewReduceTask.

At the moment, this policy is not pluggable. But I know there have
been discussions (possibly even a JIRA, though I can't locate any now)
asking for this capability.


> On Fri, May 14, 2010 at 7:04 AM, Hemanth Yamijala <yhemanth@gmail.com>wrote:
>> Saurabh,
>> > i am experimenting with hadoop. wanted to ask that is the Task
>> distribution
>> > policy by job tracker pluggable if yes where in the code tree is it
>> defined.
>> >
>> Take a look at o.a.h.mapred.TaskScheduler. That's the abstract class
>> that needs to be extended to define a new scheduling policy. Also,
>> please do take a look at the existing schedulers that extend this
>> class. There are 3-4 implementations including the default scheduler,
>> capacity scheduler, fairshare scheduler and dynamic priority
>> scheduler. It may be worthwhile to see if your ideas match any of the
>> existing implementations to some degree and then consider enhancing
>> those as a first option.
>> Thanks
>> Hemanth

View raw message