hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohith (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart
Date Mon, 19 May 2014 04:53:38 GMT

    [ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001385#comment-14001385

Rohith commented on YARN-1366:

Thats good point to discuss and take descision should RESYNC to be handle in new API or use
exiting method.

CMIIW, My point of view is that 
1. Separate API for resync, eventually differentiate for RM registered vs RM launched but
not register. RM treats both applications as same in terms of APP state transition.Only we
can avoid updating some of the data structure(trackingURI) from new API.
2. bq.Are there any other advantage on the RM side by having this information come together
in 1 "atomic" operation?
   I see , RM does not have any specific advantage of receiving pending request at one shot,
infact schedulers can server request faster. OTOH , AM's can reduce responsibility of resending
request in batch(for next heartbeats) . Batch  processing would require to define fraction
that which each set of request sent to RM.
3.For the case where AM last heart beat has been sent to RM, and RM restarted before finishApplicationMaster()
called. Does ApplicationMaterServer send resync.?

> ApplicationMasterService should Resync with the AM upon allocate call after restart
> -----------------------------------------------------------------------------------
>                 Key: YARN-1366
>                 URL: https://issues.apache.org/jira/browse/YARN-1366
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Rohith
>         Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.patch, YARN-1366.prototype.patch,
> The ApplicationMasterService currently sends a resync response to which the AM responds
by shutting down. The AM behavior is expected to change to calling resyncing with the RM.
Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire
outstanding request to the RM. Note that if the AM is making its first allocate call to the
RM then things should proceed like normal without needing a resync. The RM will return all
containers that have completed since the RM last synced with the AM. Some container completions
may be reported more than once.

This message was sent by Atlassian JIRA

View raw message