hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Question regarding HADOOP scheduler
Date Mon, 25 Jun 2012 07:17:46 GMT
Subramanian,

Yes, and for that reason the scheduler is pluggable. See Capacity
Scheduler and Fair Scheduler descriptions, as they implement something
similar (instead of the default scheduler, which is purely FIFO and
hence this behavior).

Fair Share Scheduler:
http://hadoop.apache.org/common/docs/stable/fair_scheduler.html
Capacity Scheduler:
http://hadoop.apache.org/common/docs/stable/capacity_scheduler.html

On Mon, Jun 25, 2012 at 12:24 PM, Subramanian Ganapathy
<subramanian.ganapathy86@gmail.com> wrote:
> Hi,
>
> While reading the book "HADOOP: a definitive guide", 6th chapter{ How does
> MapReduce work? }, what I understood was that tasktrackers send heartbeat
> messages indicating free slots where tasks may be scheduled and the job
> scheduler receives these heartbeat messages and based on the received ip
> address schedules the task of the next job whose input split is "closest"
> in the network topology sense to the current tasktracker from which the
> message is received.
>
> My question is isnt the scheduler needlessly restricting the throughput of
> the system i.e. what if there were another job which was not picked by the
> scheduler whose tasks are more local to the current tasktracker and by the
> time they get picked, the current tasktracker has no free slots. Wouldnt a
> shortest job first scheduling algorithm make a lot more sense w.r.t
> throughput and latency?
>
> Best,
> Subramanian



-- 
Harsh J

Mime
View raw message