hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
Date Tue, 25 Feb 2014 02:42:23 GMT

    [ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911156#comment-13911156
] 

Bikas Saha commented on YARN-1410:
----------------------------------

Sounds good.

Lets track 2) on a separate new jira. Xuan, can you please open one.

For 1) I believe the change would be limited to allow the new RM to accept an unknown application
id in submitApplication(), under the assumption that the previous RM had generated the the
app id and the previous RM died either 1) before the client even attempted to submit or 2)
before saving the app in the store and the client is retrying the new RM.

We can remove the idempotent etc annotations and just keep the change limited to the initial
proposal 1) create new API that accepts app-submission-context in which the user does not
supply the app id 2) allow the RM to accept an app-submission-context that has an unknown
app id. Based on the comment - https://issues.apache.org/jira/browse/YARN-1410?focusedCommentId=13864516&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13864516

The solution will be incomplete since the old RM could have saved the state and the new RM
would find a conflicting app-submission request with an existing app-id. Thats why we branched
off into that discussion. For now, we handle this in the following manner. 1) if the state
of the existing app is NEW then just accept the submitApplication() (effectively emulating
the RetryCache) 2) if the state of the app != NEW then fail the submitApp. OR we could choose
to solve this in the new jira being created.



> Handle client failover during 2 step client API's like app submission
> ---------------------------------------------------------------------
>
>                 Key: YARN-1410
>                 URL: https://issues.apache.org/jira/browse/YARN-1410
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Xuan Gong
>         Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch,
YARN-1410.3.patch, YARN-1410.4.patch, YARN-1410.5.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> App submission involves
> 1) creating appId
> 2) using that appId to submit an ApplicationSubmissionContext to the user.
> The client may have obtained an appId from an RM, the RM may have failed over, and the
client may submit the app to the new RM.
> Since the new RM has a different notion of cluster timestamp (used to create app id)
the new RM may reject the app submission resulting in unexpected failure on the client side.
> The same may happen for other 2 step client API operations.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message