hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brad Childs <...@redhat.com>
Subject Re: Yarn / mapreduce scheduling
Date Thu, 03 Apr 2014 20:22:54 GMT
Sandy/Shekhar Thank you very much for the helpful responses.

One last question/clarification- the getFileBlockLocations(..) in the FileSystem API is the
only file-->node mapping that I'm aware of, and it seems the only place its called is in
the map/reduce client (FileInputFormat, MultiFileSplit).  

Is this accurate: 

1) Jobclient calls getFileBlockLocations(..) and serializes this info to a file on the DFS.
2) Scheduler reads this information for determining locality

Or am I missing another call/method for determining node location for block locality?  I haven't
located the spot in source that reads the block location file for scheduler is why i ask.

-bc


----- Original Message -----
> From: "Sandy Ryza" <sandy.ryza@cloudera.com>
> To: common-dev@hadoop.apache.org
> Sent: Thursday, April 3, 2014 2:38:42 PM
> Subject: Re: Yarn / mapreduce scheduling
> 
> The equivalent code in the Fair Scheduler is in AppSchedulable.java,
> under assignContainer(FSSchedulerNode node, boolean reserved).
> 
> YARN uses delay scheduling (
> http://people.csail.mit.edu/matei/papers/2010/eurosys_delay_scheduling.pdf)
> for achieving data-locality.
> 
> -Sandy
> 
> 
> On Thu, Apr 3, 2014 at 11:16 AM, Shekhar Gupta <shkhrgpt@gmail.com> wrote:
> 
> > Hi Brad,
> >
> >     YARN scheduling does take care of data locality. In YARN, tasks are not
> > assigned based on capacity. Actually certain number of containers are
> > allocated on every node based on node's capacity. Tasks are executed on
> > those containers. While scheduling tasks on containers YARN scheduler
> > satisfies data locality requirements. I am not very familiar with Fair
> > Scheduler but if you check the source of FifoScheduler you will find a
> > function 'assignContainersonNode' which looks like following
> >
> > private int assignContainersOnNode(FiCaSchedulerNode node,
> >       FiCaSchedulerApp application, Priority priority
> >   ) {
> >     // Data-local
> >     int nodeLocalContainers =
> >       assignNodeLocalContainers(node, application, priority);
> >
> >     // Rack-local
> >     int rackLocalContainers =
> >       assignRackLocalContainers(node, application, priority);
> >
> >     // Off-switch
> >     int offSwitchContainers =
> >       assignOffSwitchContainers(node, application, priority);
> >
> >
> >     LOG.debug("assignContainersOnNode:" +
> >         " node=" + node.getRMNode().getNodeAddress() +
> >         " application=" + application.getApplicationId().getId() +
> >         " priority=" + priority.getPriority() +
> >         " #assigned=" +
> >         (nodeLocalContainers + rackLocalContainers + offSwitchContainers));
> >
> >
> >     return (nodeLocalContainers + rackLocalContainers +
> > offSwitchContainers);
> >   }
> >
> > In this routine you will find that data-local tasks are scheduled first,
> > then  rack-local and in then off-switch.
> >
> > After this you may find similar function in fairScheduler too.
> >
> > I hope this helps. Let me know if you more questions or if something is
> > wrong in my reasoning.
> >
> > Regards,
> > Shekhar
> >
> >
> > On Thu, Apr 3, 2014 at 10:56 AM, Brad Childs <bdc@redhat.com> wrote:
> >
> > > Sorry if this is the wrong list, i am looking for deep technical/hadoop
> > > source help :)
> > >
> > > How does job scheduling work on yarn framework for map reduce jobs?  I
> > see
> > > the yarn scheduler discussed here:
> > >
> > https://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YARN.htmlwhich
> > leads me to believe tasks are scheduled based on node capacity and
> > > not data locality.  I've sifted through the fair scheduler and can't find
> > > anything about data location or locality.
> > >
> > > Where does data locality play into the scheduling of map/reduce tasks on
> > > yarn?  Can someone point me to the hadoop 2.x source where the data block
> > > location is used to calculate node/container/task assignment (if thats
> > > still happening).
> > >
> > >
> > >
> > > -bc
> > >
> > >
> >
> 

Mime
View raw message