hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart
Date Mon, 19 May 2014 20:35:39 GMT

    [ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002344#comment-14002344

Jian He commented on YARN-1366:

bq. When the RM comes back up how does it differentiate between v1 and v2 and keep v2 and
ask v1 to exit? Does this already work?
There’s a response map in AMS to differentiate the attempt, I think this should work already.
bq. It would be easier for users if the RM would simply accept the first register from the
app and the last finishApplicationMaster() without needing a resync.
bq. For the case where AM last heartbeat has been sent to RM, and RM restarted before finishApplicationMaster()
called. Does ApplicationMaterServer send resync?
Seems we have a race that allocate call gets the resync and do the re-register even after
the finishApplicationMaster is called. Checked the MR code that this cannot happen because
the allocate thread is interrupted and joined before calling unregister. We may document the
API say that allocate should not be called after finishApplicationMaster or handle it explicitly
in RM ?

> ApplicationMasterService should Resync with the AM upon allocate call after restart
> -----------------------------------------------------------------------------------
>                 Key: YARN-1366
>                 URL: https://issues.apache.org/jira/browse/YARN-1366
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Rohith
>         Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.patch, YARN-1366.prototype.patch,
> The ApplicationMasterService currently sends a resync response to which the AM responds
by shutting down. The AM behavior is expected to change to calling resyncing with the RM.
Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire
outstanding request to the RM. Note that if the AM is making its first allocate call to the
RM then things should proceed like normal without needing a resync. The RM will return all
containers that have completed since the RM last synced with the AM. Some container completions
may be reported more than once.

This message was sent by Atlassian JIRA

View raw message