hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
Date Wed, 15 Jan 2014 22:32:27 GMT

    [ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872709#comment-13872709
] 

Karthik Kambatla commented on YARN-1410:
----------------------------------------

bq. But, I am not sure why we need to use these annotation in RM restart.
Adding to what Bikas has already said. Consider RM restarting while an app submission is in-flight.
The possibilities are exactly the same as in the case of failover. 

bq. Also, the AtMostOnce and Idempotent annotation are only used when RetryDecision is FAILOVER_AND_RETRY.
So, this is another reason why we do not have them in RM restart case (For the RM restart,
the valid RetryDecision is RETRY).
Precisely my point. IMO, RM restart should not be directly using RetryUptoFixedTime or whatever
it is currently using. Eventually, when RetryPolicy#shouldRetry is called, the annotations
should be taken into account. 

> Handle client failover during 2 step client API's like app submission
> ---------------------------------------------------------------------
>
>                 Key: YARN-1410
>                 URL: https://issues.apache.org/jira/browse/YARN-1410
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Xuan Gong
>         Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch,
YARN-1410.3.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> App submission involves
> 1) creating appId
> 2) using that appId to submit an ApplicationSubmissionContext to the user.
> The client may have obtained an appId from an RM, the RM may have failed over, and the
client may submit the app to the new RM.
> Since the new RM has a different notion of cluster timestamp (used to create app id)
the new RM may reject the app submission resulting in unexpected failure on the client side.
> The same may happen for other 2 step client API operations.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message