hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM
Date Tue, 29 Sep 2015 02:22:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14934506#comment-14934506
] 

Hudson commented on YARN-4180:
------------------------------

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #431 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/431/])
YARN-4180. AMLauncher does not retry on failures when talking to NM. (adhoot) (adhoot: rev
9735afe967a660f356e953348cb6c34417f41055)
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java
* hadoop-yarn-project/CHANGES.txt
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java


> AMLauncher does not retry on failures when talking to NM 
> ---------------------------------------------------------
>
>                 Key: YARN-4180
>                 URL: https://issues.apache.org/jira/browse/YARN-4180
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.7.1
>            Reporter: Anubhav Dhoot
>            Assignee: Anubhav Dhoot
>            Priority: Critical
>         Attachments: YARN-4180-branch-2.7.2.txt, YARN-4180.001.patch, YARN-4180.002.patch,
YARN-4180.002.patch, YARN-4180.002.patch
>
>
> We see issues with RM trying to launch a container while a NM is restarting and we get
exceptions like NMNotReadyException. While YARN-3842 added retry for other clients of NM (AMs
mainly) its not used by AMLauncher in RM causing there intermittent errors to cause job failures.
This can manifest during rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message