hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhenhua Guo <jen...@gmail.com>
Subject Re: mapper and reducer scheduling
Date Wed, 03 Nov 2010 21:40:26 GMT
Thanks, Jeff, Harsh, He, Hemanth. Those information is quite helpful!

Gerald

On Mon, Nov 1, 2010 at 12:01 AM, Hemanth Yamijala <yhemanth@gmail.com> wrote:
> Hi,
>
> On Mon, Nov 1, 2010 at 9:13 AM, He Chen <airbots@gmail.com> wrote:
>> If you use the default scheduler of hadoop 0.20.2 or higher. The
>> jobQueueScheduler will take the data locality into account.
>
> This is true irrespective of the scheduler in use. Other schedulers
> currently add a layer to decide which job to pick up first based on
> constraints they choose to satisfy - like fairness, queue capacities
> etc. Once a job is picked up, the logic for picking up a task within
> the job is currently in framework code that all schedulers use.
>
>> That means when
>> a heart beat from TT arrives, the JT will first check a cache which is a map
>> of node and data-local tasks this node has.  The JT will assign node local
>> task first, then the rack local, non-local, recover and speculative tasks if
>> they have default priorities.
>>
>> If a TT get a non-local task, it will query the nodes which have the data
>> and finish this task, you can also decide to keep those fetched data on this
>> TT or not by configuring the Hadoop mapred-site.xml file.
>>
>> BTW, even TT get a data local task, it may also ask other data owner (if you
>> have more than one replica)for data to accelerate the process. (??? my
>> understanding, any one can confirm)
>
> Not that I am aware of. The task's input location is used directly to
> read the data.
>
> Thanks
> Hemanth
>>
>> Hope this will help.
>>
>> Chen
>>
>> On Sun, Oct 31, 2010 at 9:49 PM, Zhenhua Guo <jenvor@gmail.com> wrote:
>>
>>> Thanks!
>>> One more question. Is the input file replicated on each node where a
>>> mapper is run? Or just the portion processed by a mapper is
>>> transferred?
>>>
>>> Gerald
>>>
>>> On Fri, Oct 29, 2010 at 10:11 AM, Harsh J <qwertymaniac@gmail.com> wrote:
>>> > Hello,
>>> >
>>> > On Fri, Oct 29, 2010 at 12:45 PM, Jeff Zhang <zjffdu@gmail.com> wrote:
>>> >> TaskTracker will tell JobTracker how many free slots it has through
>>> >> heartbeat. And JobTracker will choose the best tasktracker with the
>>> >> consideration of data locality.
>>> >
>>> > Yes. To add some more, a scheduler is responsible to do assignments of
>>> > tasks (based on various stats, including data locality) to proper
>>> > tasktrackers. Scheduler.assignTasks(TaskTracker) is used to assign a
>>> > TaskTracker its tasks, and the scheduler type is configurable (Some
>>> > examples are Eager/FIFO scheduler, Capacity scheduler, etc.).
>>> >
>>> > This scheduling is done when a heart beat response is to be sent back
>>> > to a TaskTracker that called JobTracker.heartbeat(...).
>>> >
>>> >>
>>> >>
>>> >> On Thu, Oct 28, 2010 at 2:52 PM, Zhenhua Guo <jenvor@gmail.com>
wrote:
>>> >>> Hi, all
>>> >>>  I wonder how Hadoop schedules mappers and reducers (e.g. consider
>>> >>> load balancing, affinity to data?). For example, how to decide on
>>> >>> which nodes mappers and reducers are to be executed and when.
>>> >>>  Thanks!
>>> >>>
>>> >>> Gerald
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Best Regards
>>> >>
>>> >> Jeff Zhang
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Harsh J
>>> > www.harshj.com
>>> >
>>>
>>
>

Mime
View raw message