hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2561) MR job client cannot reconnect to AM after NM restart.
Date Thu, 18 Sep 2014 20:30:36 GMT

    [ https://issues.apache.org/jira/browse/YARN-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139450#comment-14139450
] 

Jason Lowe commented on YARN-2561:
----------------------------------

bq. Yes. Update with adding with newNode instead of old rmNode.

Ah, I see now.  Before I thought the node update event was just going to the scheduler but
now I see it's sending an event to itself which will eventually set totalCapability properly.

+1 on the latest patch pending Jenkins.

> MR job client cannot reconnect to AM after NM restart.
> ------------------------------------------------------
>
>                 Key: YARN-2561
>                 URL: https://issues.apache.org/jira/browse/YARN-2561
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Tassapol Athiapinya
>            Assignee: Junping Du
>            Priority: Blocker
>         Attachments: YARN-2561-v2.patch, YARN-2561-v3.patch, YARN-2561-v4.patch, YARN-2561-v5.patch,
YARN-2561.patch
>
>
> Work-preserving NM restart is disabled.
> Submit a job. Restart the only NM and found that Job will hang with connect retries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message