hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1410) Handle RM fails over after getApplicationID() and before submitApplication().
Date Sat, 08 Mar 2014 04:40:44 GMT

    [ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924713#comment-13924713
] 

Vinod Kumar Vavilapalli commented on YARN-1410:
-----------------------------------------------

Okay, that makes sense - we can't break existing apps because of this.

Restating for others who are listening: This patch isn't adding any more code that what is
already present w.r.t handling of appIDs. The original statement in the description
bq. Since the new RM has a different notion of cluster timestamp (used to create app id) the
new RM may reject the app submission resulting in unexpected failure on the client side.
clearly doesn't happen at present (before the patch) because we don't have AppID validations
in RM. The solution to the validation when we get to it is to make active and standby RM to
recognize cluster-timestamps of (at-least some of) their own past generations as well as those
of others - may be through state-store persistence.

The existing patch looks fine enough to me. Checking this in.

> Handle RM fails over after getApplicationID() and before submitApplication().
> -----------------------------------------------------------------------------
>
>                 Key: YARN-1410
>                 URL: https://issues.apache.org/jira/browse/YARN-1410
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Xuan Gong
>         Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.10.patch,
YARN-1410.10.patch, YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch,
YARN-1410.5.patch, YARN-1410.6.patch, YARN-1410.7.patch, YARN-1410.8.patch, YARN-1410.9.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> App submission involves
> 1) creating appId
> 2) using that appId to submit an ApplicationSubmissionContext to the user.
> The client may have obtained an appId from an RM, the RM may have failed over, and the
client may submit the app to the new RM.
> Since the new RM has a different notion of cluster timestamp (used to create app id)
the new RM may reject the app submission resulting in unexpected failure on the client side.
> The same may happen for other 2 step client API operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message