hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1
Date Wed, 25 Jun 2014 22:12:26 GMT

    [ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044118#comment-14044118
] 

Jian He commented on YARN-614:
------------------------------

Xuan, can you emulate what are the failures that should not be counted towards AM failures
and the corresponding am container exit code? seems ABORTED , KILL_BY_RESOURCEMANAGER are
used for other sources too. If necessary, we need to create separate exit code for these particular
cases. Can you also update the title/description to reflect what this patch is doing ? thx

> Retry attempts automatically for hardware failures or YARN issues and set default app
retries to 1
> --------------------------------------------------------------------------------------------------
>
>                 Key: YARN-614
>                 URL: https://issues.apache.org/jira/browse/YARN-614
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Bikas Saha
>            Assignee: Xuan Gong
>             Fix For: 2.5.0
>
>         Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, YARN-614-3.patch,
YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, YARN-614.7.patch
>
>
> Attempts can fail due to a large number of user errors and they should not be retried
unnecessarily. The only reason YARN should retry an attempt is when the hardware fails or
YARN has an error. NM failing, lost NM and NM disk errors are the hardware errors that come
to mind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message