hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1885) RM may not send the finished signal to some nodes where the application ran after RM restarts
Date Tue, 29 Apr 2014 23:20:17 GMT

    [ https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984945#comment-13984945
] 

Wangda Tan commented on YARN-1885:
----------------------------------

[~jianhe], Thanks for your review!
bq. some places exceed the 80 column limit, like the RMAppImpl transitions.
Will correct this later
bq. app.isAppFinalStateStored() better use isAppInFinalState instead ?
Agree, it's a bug using isAppFinalStateStored()
bq. sleeping for a fixed amount time is not deterministic, test may fail randomly. it’s
better doing it in a while loop with heartbeats, and exit out of the loop if condition meets.
Agree
bq. timeout = 600000, timeout too long.
Sorry for this typo :)
bq. these two transitions cannot happen? Generally, we should not add events to states where
the transitions can never happen, that’ll hide bugs.
Agree, and I think SUBMITTED is also cannot happen, because an app with SUBMITTED state doesn't
launch any container, so NMs will not have the app in runningApplication list. Do you agree?

bq. These two loops may block the register RPC call for a while, I think we may send them
as the payload of RMNodeStartEvent and handle them in RMNodeAddTransition ?
IMO, this shouldn't be a big problem, because there's no blocking calls existed in handleRunningAppOnNode/handleContainerStatus.
So additional microseconds of latency (just loop array) should be fine. Is it?
Attached new patch.

> RM may not send the finished signal to some nodes where the application ran after RM
restarts
> ---------------------------------------------------------------------------------------------
>
>                 Key: YARN-1885
>                 URL: https://issues.apache.org/jira/browse/YARN-1885
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: Arpit Gupta
>            Assignee: Wangda Tan
>         Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch
>
>
> During our HA testing we have seen cases where yarn application logs are not available
through the cli but i can look at AM logs through the UI. RM was also being restarted in the
background as the application was running.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message