hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian He (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures
Date Tue, 20 May 2014 22:22:39 GMT

     [ https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jian He updated YARN-2074:
--------------------------

    Attachment: YARN-2074.1.patch

Patch to not account AM preemption as AM failure.
Patch checks the diagnostics of the attempt to determine whether this attempt is preempted
or not.

There's a race condition related to RM restart which is not addressed in this patch. If the
attempt is preempted and RM restarts before the attempt state is saved in the state store.
The new RM won't be able to figure out whether the previous attempt is preempted or not.
Fixing this may require the NM-RM protocol change to indicate NM whether the AM preempted
or killed so that when RM recovers NM can notify RM back whether the previous AM container
is preempted or not. In addition, RMContainer transition may also need to be changed accordingly.
we may fix it in separate jira.


> Preemption of AM containers shouldn't count towards AM failures
> ---------------------------------------------------------------
>
>                 Key: YARN-2074
>                 URL: https://issues.apache.org/jira/browse/YARN-2074
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Jian He
>         Attachments: YARN-2074.1.patch
>
>
> One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM containers
getting preempted shouldn't count towards AM failures and thus shouldn't eventually fail applications.
> We should explicitly handle AM container preemption/kill as a separate issue and not
count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message