hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuan Gong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1764) Handle RM fail overs after the submitApplication call.
Date Mon, 03 Mar 2014 19:46:29 GMT

    [ https://issues.apache.org/jira/browse/YARN-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918499#comment-13918499

Xuan Gong commented on YARN-1764:

Let us continue our discussions on case 3: Handle RM fail overs after the submitApplication

Reply to [~kkambatl]‘s comment:
“ I don't see 3 to be as straight-forward, and suspect would require revisiting the state

We will only consider the case that failover happens after submitApplication call. It means
when failover happens, we have already received the SubmitApplicationResponse.

When the failover happens, we will *not re-entry* clientRMService#submitApplication() again.
What will happen next is that getApplicationReport() will start to execute. And YarnClient
will start to re-try until it finds the next active RM, and continue execute getApplicationReport().

Now we have two cases to handle:
* RMStateStore already saved the ApplicationState when failover happens.
* RMStateStore does not save the ApplicationState when failover happens.

For case1, we do not need to make any changes.
For case2, if the failover happens, when we try to execute getApplicationReport, we will get
ApplicationNotFoundException. I think this is the only case we should handle here.

> Handle RM fail overs after the submitApplication call.
> ------------------------------------------------------
>                 Key: YARN-1764
>                 URL: https://issues.apache.org/jira/browse/YARN-1764
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong

This message was sent by Atlassian JIRA

View raw message