hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuan Gong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
Date Mon, 24 Feb 2014 21:51:21 GMT

    [ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910860#comment-13910860

Xuan Gong commented on YARN-1410:

bq. I think the response of ClientRMService#submitApplication() should tell us whether the
submission is successful or not. If that is not the case, we should probably fix that first.

What I am trying to say is : for HDFS delete operation, if we can response, we can tell if
the delete operation is finished successfully or not. If the failover happens after we receives
the response and if we are trying to do repeated operation, it can easily tell that the previous
operation is successful, do not do it again.

But if we get the response from ClientRMService#submitApplication(), we can not tell whether
we submit the application successfully or not. If the failover happens, there are several
different cases we need to handle differently, especially the cases whether the appState is
saved in RMStateStore or not.

> Handle client failover during 2 step client API's like app submission
> ---------------------------------------------------------------------
>                 Key: YARN-1410
>                 URL: https://issues.apache.org/jira/browse/YARN-1410
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Xuan Gong
>         Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch,
YARN-1410.3.patch, YARN-1410.4.patch, YARN-1410.5.patch
>   Original Estimate: 48h
>  Remaining Estimate: 48h
> App submission involves
> 1) creating appId
> 2) using that appId to submit an ApplicationSubmissionContext to the user.
> The client may have obtained an appId from an RM, the RM may have failed over, and the
client may submit the app to the new RM.
> Since the new RM has a different notion of cluster timestamp (used to create app id)
the new RM may reject the app submission resulting in unexpected failure on the client side.
> The same may happen for other 2 step client API operations.

This message was sent by Atlassian JIRA

View raw message