hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naren Koneru (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1577) Unmanaged AM is broken because of YARN-1493
Date Wed, 26 Feb 2014 22:37:25 GMT

    [ https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913633#comment-13913633
] 

Naren Koneru commented on YARN-1577:
------------------------------------

After digging through the details, here's the summary as I understand (sorry for the repetition
if any).

- Today, the unmanaged client (llama) is sending a request to launch the AM, then waiting
for the App state to be ACCEPTED and then its registering the AM using AMRMClientAsync.registerApplicationMaster.

- This register call expects the AM RM token to be set, which is part of the application report.
The Client gets this token by calling ApplicationClientProtocol.getApplicationReport after
the APP is accepted.

With the change in YARN-1493, this is broken since the AppAttempt is launched after the application
is accepted and hence the token is not set. So the client can run into race conditions depending
on when its getting the application report. The temporary hack we made in the client is to
retry for a fixed number of times.

One way to solve this could be:
- Change the ApplicationReport (returned by ApplicationClientProtocol.getApplicationReport)
to add an attempt state, so the client can rely on the Attempt state to be launched before
proceeding with the UAM registration.

- However, this would not be backwards compatible since it involves changes to the unmanaged
clients. Since I do not see any documentation for the unmanaged clients, is this acceptable?

Is this proposal ok?. If not, any other suggestions?. If this proposal is ok, then I can 
submit a patch. Pls comment.

> Unmanaged AM is broken because of YARN-1493
> -------------------------------------------
>
>                 Key: YARN-1577
>                 URL: https://issues.apache.org/jira/browse/YARN-1577
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>    Affects Versions: 2.3.0
>            Reporter: Jian He
>            Assignee: Naren Koneru
>            Priority: Blocker
>
> Today unmanaged AM client is waiting for app state to be Accepted to launch the AM. This
is broken since we changed in YARN-1493 to start the attempt after the application is Accepted.
We may need to introduce an attempt state report that client can rely on to query the attempt
state and choose to launch the unmanaged AM.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message