hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuan Gong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
Date Tue, 25 Feb 2014 18:14:22 GMT

    [ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911817#comment-13911817

Xuan Gong commented on YARN-1410:

Sounds good to me.
For 1) RM fails over after getApplicationID() and *before* submitApplication().

The changes we will make is to let RM accept the “old” applicationId which includes:
* make RM accept the applicationId in the context
* If there is no applicationId specified in the context, RM will assign a new ApplicationId

For 2) RM fail overs *during* the submitApplication call.

We have many discussions for this scenario.  We can open a separate ticket for it.

For 3) RM fails over *after* the submitApplication call and before the subsequent getApplicationReport().

We can mark getApplicationReport() as Idempotent, and need to handle two different cases:
* Failover happens after SubmitApplicationResponse is received, but RMStateStore does not
save the applicationState. In this case, when the getApplicationReport() is called, we will
get an ApplicationNotFoundException. So, we need to catch this exception and submit this application
* Failover happens after SubmitApplicationResponse is received, and RMStateStore saves the
applicationState. Nothing need to be changed.

> Handle client failover during 2 step client API's like app submission
> ---------------------------------------------------------------------
>                 Key: YARN-1410
>                 URL: https://issues.apache.org/jira/browse/YARN-1410
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Xuan Gong
>         Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch,
YARN-1410.3.patch, YARN-1410.4.patch, YARN-1410.5.patch
>   Original Estimate: 48h
>  Remaining Estimate: 48h
> App submission involves
> 1) creating appId
> 2) using that appId to submit an ApplicationSubmissionContext to the user.
> The client may have obtained an appId from an RM, the RM may have failed over, and the
client may submit the app to the new RM.
> Since the new RM has a different notion of cluster timestamp (used to create app id)
the new RM may reject the app submission resulting in unexpected failure on the client side.
> The same may happen for other 2 step client API operations.

This message was sent by Atlassian JIRA

View raw message