hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
Date Tue, 25 Feb 2014 01:56:23 GMT

    [ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911105#comment-13911105

Vinod Kumar Vavilapalli commented on YARN-1410:

Finally on to this.

There are three types of fail-over conditions w.r.t submission:
 # RM fails over after getApplicationID() and *before* submitApplication().
 # RM fail overs *during* the submitApplication call.
 # RM fails over *after* the submitApplication call and before the subsequent getApplicationReport().

This JIRA started to solve (1) above (as described in the description) and completely degenerated
into (2).

In the interest of making progress, can we focus only on (1) here and track (2) and (3) separately?
(1) itself has implications on the user APIs depending the implementation. I had looked at
few of the very early patches and I believe Xuan was trying to solve those in this JIRA.

> Handle client failover during 2 step client API's like app submission
> ---------------------------------------------------------------------
>                 Key: YARN-1410
>                 URL: https://issues.apache.org/jira/browse/YARN-1410
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Xuan Gong
>         Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch,
YARN-1410.3.patch, YARN-1410.4.patch, YARN-1410.5.patch
>   Original Estimate: 48h
>  Remaining Estimate: 48h
> App submission involves
> 1) creating appId
> 2) using that appId to submit an ApplicationSubmissionContext to the user.
> The client may have obtained an appId from an RM, the RM may have failed over, and the
client may submit the app to the new RM.
> Since the new RM has a different notion of cluster timestamp (used to create app id)
the new RM may reject the app submission resulting in unexpected failure on the client side.
> The same may happen for other 2 step client API operations.

This message was sent by Atlassian JIRA

View raw message