hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bikas Saha <bi...@hortonworks.com>
Subject RE: scheduler satisfying heterogeneous resource requests at same priority
Date Fri, 04 Jan 2013 17:46:16 GMT
Most likely because mappers and reducers are scheduled at different
priorities.

To summarize, the issue seems to be in AppSchedulingInfo not maintaining
ResourceRequests by the Resource capability. Alternative would be to have
ResourceRequest itself contain multiple capabilities but that IMO would be
hard to work with and also a big surgery to the code base.

-----Original Message-----
From: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
Sent: Wednesday, January 02, 2013 12:47 PM
To: yarn-dev@hadoop.apache.org
Subject: Re: scheduler satisfying heterogeneous resource requests at same
priority

Thanks for looking into it Bikas.  What you wrote makes sense to me.
You're right that it's the last request not the largest.  Otherwise, you
summarize my confusion well - why doesn't AppSchedulingInfo hold a list of
ResourceRequests for each node/priority?

I also don't understand why this hasn't caused a problem already for
mapreduce when mappers and reducers request different amounts of memory.
 It must be either because reduces are requested after all map containers
are completed? Or because they're requested at non-overlapping locations?

On Wed, Jan 2, 2013 at 11:04 AM, Bikas Saha <bikas@hortonworks.com> wrote:

> Reading the code seems to suggest that AppSchedulingInfo is not
> preferring the larger request. Its simply returning the last request
> for that priority and hostname. So it could be that in your case, the
> larger request is the second request. You could try and make it the
> first request and check if you get the same results.
>
> Wrt, your ResourceRequest question, having a single Resource
> capability simplifies ResourceRequest operations. Having heterogeneous
> resources is allowed by the API by submitting multiple
> ResourceRequests having different Resource capabilities. See the
> RMContainerRequestor code in the MR YARN app. Given the above, it
> looks like the Resource heterogeneity is lost inside the
> AppSchedulingInfo and that may be a bug or a conscious decision.
> Looking at folks experienced in that code for an answer. How is
> everything working despite this? Perhaps because the applications are
not issuing heterogeneous requests for a given priority and location.
> Secondly, the * catch all is always around to save the day.
>
> Let me know if this makes sense. I may have missed stuff.
>
> -----Original Message-----
> From: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
> Sent: Friday, December 28, 2012 4:46 PM
> To: yarn-dev@hadoop.apache.org
> Subject: scheduler satisfying heterogeneous resource requests at same
> priority
>
> I am trying to understand how YARN schedulers are able to satisfy
> smaller requests while larger requests are outstanding (per YARN-289).
>
> Consider the following situation:
> An application submits two requests - one for a container with 1024 MB
> and one for a container with 2048 MB.  1024 MB frees up on a node.
> The scheduler should (or might wish to) place the smaller container on
> the node, instead of placing a reservation for the larger one.
>
> However, currently, if I understand correctly, the larger request is
> always serviced first.  AppSchedulingInfo, which is used by all the
> schedulers to find a container request when space becomes available,
> stores a map of priorities to maps of node/rack/* to ResourceRequests.
> A ResourceRequest contains a single Resource (capability), and the
> number of containers.  Why does a ResourceRequest not allow for
> heterogeneous containers.  Is this just not supported yet because it
> hasn't been needed yet?  Or is there a more fundamental reason I'm
> missing about why it doesn't make sense?
>
> many thanks for any guidance,
> Sandy
>

Mime
View raw message