hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1588) Rebind NM tokens for previous attempt's running containers to the new attempt
Date Tue, 25 Feb 2014 02:30:19 GMT

    [ https://issues.apache.org/jira/browse/YARN-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911149#comment-13911149

Jian He commented on YARN-1588:

bq. Not sure why we are undoing the locking in NMToken PBImpl
That was to fix the inconsistent synchronization issue, moved the equals and hashCode to the
NMToken base class.

bq. we are not handling DNS failure when generating NMTokens.
Tx for pointing out. UnkownHostException turns out to be a kind of IOException and that can
be retried by RMProxy in RPC layer.  The patch made the change to throw UnkownHostException
directly and expect to be retried within RPC layer,  and so AM itself is unknown about the

Changed AMRMClient to populate NMtoken from previous attempts into NMTokenCache so that it
works in secure cluster.

Testing on secure cluster:
- Code change: Changed distributed shell to immediately call nmClientAnsyc.getContainerStatus
for each transferred container after it get the containers from previous attempts.
List<Container> previousAMRunningContainers =
    LOG.info("Received " + previousAMRunningContainers.size()
        + " previous AM's running containers on AM registration.");
    for ( Container container: previousAMRunningContainers) {
      nmClientAsync.getContainerStatusAsync(container.getId(), container.getNodeId());
- Started the distributed shell with a long sleep command.
- kill the ApplicationMaster.
- New AM started and after it getContainersFromPreviousAttempts, it will call getContainerStatus
for each transferred container and so the transferred NMToken should be used to talk with
the corresponding NM.

> Rebind NM tokens for previous attempt's running containers to the new attempt
> -----------------------------------------------------------------------------
>                 Key: YARN-1588
>                 URL: https://issues.apache.org/jira/browse/YARN-1588
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-1588.1.patch, YARN-1588.1.patch, YARN-1588.2.patch, YARN-1588.3.patch,

This message was sent by Atlassian JIRA

View raw message