hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart
Date Mon, 19 May 2014 17:50:38 GMT

    [ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002059#comment-14002059
] 

Bikas Saha commented on YARN-1366:
----------------------------------

It would be easier for users if the RM would simply accept the first register from the app
and the last finishApplicationMaster() without needing a resync.
Lets says that app version 1 was running and we considered it lost because we lost network
communication. So the RM started version 2 of the app. Then the RM dies. Then network connectivity
for app 1 got restored. Now both v1 and v2 are trying to make allocate calls to the non-existent
RM instance. When the RM comes back up how does it differentiate between v1 and v2 and keep
v2 and ask v1 to exit? Does this already work? Until now it may not have been a problem because
the RM would always ask these to exit and start a new v3.

> ApplicationMasterService should Resync with the AM upon allocate call after restart
> -----------------------------------------------------------------------------------
>
>                 Key: YARN-1366
>                 URL: https://issues.apache.org/jira/browse/YARN-1366
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Rohith
>         Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.patch, YARN-1366.prototype.patch,
YARN-1366.prototype.patch
>
>
> The ApplicationMasterService currently sends a resync response to which the AM responds
by shutting down. The AM behavior is expected to change to calling resyncing with the RM.
Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire
outstanding request to the RM. Note that if the AM is making its first allocate call to the
RM then things should proceed like normal without needing a resync. The RM will return all
containers that have completed since the RM last synced with the AM. Some container completions
may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message