hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carlo Curino (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1434) Single Job can affect fairshare of others
Date Fri, 22 Nov 2013 22:31:35 GMT

    [ https://issues.apache.org/jira/browse/YARN-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13830368#comment-13830368
] 

Carlo Curino commented on YARN-1434:
------------------------------------

Srikanth, what we observed (again in a noise environment, so to be validated) is that the
AM returning containers is maintaining is position as "under capacity" w.r.t. other machines,
since it returned a bunch of containers, so it will be picked again as highest in priority.
As a consequence it is wasting containers in a way that in our small setup was harming other
jobs opportunity to get access to containers. 

If Robert has few spare cycles, he will try to make a minimal patch to the MR AM that make
it behave maliciously and try again on the CapacityScheduler, and maybe Sandy could try it
with the fair scheduler? 

If we confirm this is indeed a problem, and that is substantial for non-trivial scenarios
(we noticed it for 2 jobs in 2 queues on 10 machines, not sure whether has impact at scale),
we might need to tweak the schedulers logics to penalize users that yield back lots of containers
(e.g., accounting for those containers against the user quota for n seconds or something).


> Single Job can affect fairshare of others
> -----------------------------------------
>
>                 Key: YARN-1434
>                 URL: https://issues.apache.org/jira/browse/YARN-1434
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Carlo Curino
>            Priority: Minor
>
> A job receiving containers and deciding not to use them and yielding them back in the
next heartbeat could significantly affect the amount of resources given to other jobs. 
> This is because by yielding containers back the job appears always to be under-capacity
(more than others) so it is picked to be the next to receive containers.
> Observed by Robert Grandl, to be independently confirmed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message