hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy Ryza <sandy.r...@cloudera.com>
Subject Re: Yarn / mapreduce scheduling
Date Thu, 03 Apr 2014 19:38:42 GMT
The equivalent code in the Fair Scheduler is in AppSchedulable.java,
under assignContainer(FSSchedulerNode node, boolean reserved).

YARN uses delay scheduling (
http://people.csail.mit.edu/matei/papers/2010/eurosys_delay_scheduling.pdf)
for achieving data-locality.

-Sandy


On Thu, Apr 3, 2014 at 11:16 AM, Shekhar Gupta <shkhrgpt@gmail.com> wrote:

> Hi Brad,
>
>     YARN scheduling does take care of data locality. In YARN, tasks are not
> assigned based on capacity. Actually certain number of containers are
> allocated on every node based on node's capacity. Tasks are executed on
> those containers. While scheduling tasks on containers YARN scheduler
> satisfies data locality requirements. I am not very familiar with Fair
> Scheduler but if you check the source of FifoScheduler you will find a
> function 'assignContainersonNode' which looks like following
>
> private int assignContainersOnNode(FiCaSchedulerNode node,
>       FiCaSchedulerApp application, Priority priority
>   ) {
>     // Data-local
>     int nodeLocalContainers =
>       assignNodeLocalContainers(node, application, priority);
>
>     // Rack-local
>     int rackLocalContainers =
>       assignRackLocalContainers(node, application, priority);
>
>     // Off-switch
>     int offSwitchContainers =
>       assignOffSwitchContainers(node, application, priority);
>
>
>     LOG.debug("assignContainersOnNode:" +
>         " node=" + node.getRMNode().getNodeAddress() +
>         " application=" + application.getApplicationId().getId() +
>         " priority=" + priority.getPriority() +
>         " #assigned=" +
>         (nodeLocalContainers + rackLocalContainers + offSwitchContainers));
>
>
>     return (nodeLocalContainers + rackLocalContainers +
> offSwitchContainers);
>   }
>
> In this routine you will find that data-local tasks are scheduled first,
> then  rack-local and in then off-switch.
>
> After this you may find similar function in fairScheduler too.
>
> I hope this helps. Let me know if you more questions or if something is
> wrong in my reasoning.
>
> Regards,
> Shekhar
>
>
> On Thu, Apr 3, 2014 at 10:56 AM, Brad Childs <bdc@redhat.com> wrote:
>
> > Sorry if this is the wrong list, i am looking for deep technical/hadoop
> > source help :)
> >
> > How does job scheduling work on yarn framework for map reduce jobs?  I
> see
> > the yarn scheduler discussed here:
> >
> https://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YARN.htmlwhich leads
me to believe tasks are scheduled based on node capacity and
> > not data locality.  I've sifted through the fair scheduler and can't find
> > anything about data location or locality.
> >
> > Where does data locality play into the scheduling of map/reduce tasks on
> > yarn?  Can someone point me to the hadoop 2.x source where the data block
> > location is used to calculate node/container/task assignment (if thats
> > still happening).
> >
> >
> >
> > -bc
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message