hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anubhav Dhoot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart
Date Mon, 19 May 2014 22:27:38 GMT

    [ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002499#comment-14002499

Anubhav Dhoot commented on YARN-1366:

To summarize along with current changes in YARN-1365 (which sets responseMap to -1 in recovery,
ie allows the latest known AM to register/finish on resync) we need 2 more changes
a) return SHUTDOWN instead of resync for empty responseMap (ie for any AMs that are not known
to be the latest)  
b) For known last AMs,
b.1) allow finishApplicationMaster to succeed when responseMap is set to -1 (ie not yet registered
but known to be last). 
b.2) return RESYNC for all allocate for known AMs that have not yet registered. 
b.3) allow register for known AM after restart (already covered in 1365's current patch)

[~rohithsharma] let me know if you mind if we add these as well to [YARN-1365|https://issues.apache.org/jira/browse/YARN-1365].
Its needed for fixing the unit test failures in 1365's current patch and will also keep it
consistent instead of split across patches. We can keep this patch for all the AM side of

> ApplicationMasterService should Resync with the AM upon allocate call after restart
> -----------------------------------------------------------------------------------
>                 Key: YARN-1366
>                 URL: https://issues.apache.org/jira/browse/YARN-1366
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Rohith
>         Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.patch, YARN-1366.prototype.patch,
> The ApplicationMasterService currently sends a resync response to which the AM responds
by shutting down. The AM behavior is expected to change to calling resyncing with the RM.
Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire
outstanding request to the RM. Note that if the AM is making its first allocate call to the
RM then things should proceed like normal without needing a resync. The RM will return all
containers that have completed since the RM last synced with the AM. Some container completions
may be reported more than once.

This message was sent by Atlassian JIRA

View raw message