hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Omkar Vinit Joshi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable
Date Wed, 06 Nov 2013 02:30:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814541#comment-13814541
] 

Omkar Vinit Joshi commented on YARN-674:
----------------------------------------

Thanks [~jianhe], [~bikassaha] .

bq. Saw this is changed back to asynchronous submission on recovery, the original intention
was to prevent client from seeing the application as a new application. If asynchronously,
the client can query the application before recover event gets processed, meaning before the
application is fully recovered as some recover logic happens when app is processing the recover
event(app.FinalTransition).
fixed to make sure that it gets updated synchronously.

bq. The assert doesnt make it to the production jar - so it wont catch anything on the cluster.
Need to throw an exception here. If we dont want to crash the RM here then we can log and
error. When the attempt state machine gets the event then it will crash on the async dispatcher
thread if the event is not handled in the current state.
discussed with bikas offline.. this is fine.

> Slow or failing DelegationToken renewals on submission itself make RM unavailable
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-674
>                 URL: https://issues.apache.org/jira/browse/YARN-674
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Omkar Vinit Joshi
>         Attachments: YARN-674.1.patch, YARN-674.2.patch, YARN-674.3.patch, YARN-674.4.patch,
YARN-674.5.patch, YARN-674.5.patch
>
>
> This was caused by YARN-280. A slow or a down NameNode for will make it look like RM
is unavailable as it may run out of RPC handlers due to blocked client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message