hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1885) RM may not send the finished signal to some nodes where the application ran after RM restarts
Date Tue, 29 Apr 2014 20:05:19 GMT

    [ https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984722#comment-13984722

Jian He commented on YARN-1885:

Thanks for the update!
- some places exceed the 80 column limit, like the RMAppImpl transitions.
- app.isAppFinalStateStored() better use isAppInFinalState instead ?
- sleeping for a fixed amount time is not deterministic, test may fail randomly. it’s better
doing it in a while loop with heartbeats, and exit out of the loop if condition meets.
    // sleep for a while before do next heartbeat
    NodeHeartbeatResponse response = nm1.nodeHeartbeat(true);
- timeout = 600000, timeout too long.
- these two transitions cannot happen? Generally, we should not add events to states where
the transitions can never happen, that’ll hide bugs.
    .addTransition(RMAppState.NEW, RMAppState.NEW, RMAppEventType.NODE_ADDED,
        new NodeAddedTransition())
    .addTransition(RMAppState.NEW_SAVING, RMAppState.NEW_SAVING, RMAppEventType.NODE_ADDED,
        new NodeAddedTransition())
- These two loops may block the register RPC call for a while, I think we may send them as
the payload of RMNodeStartEvent and handle them in RMNodeAddTransition ?
    // Handle container statuses reported by NM
    if (!request.getContainerStatuses().isEmpty()) {
      LOG.info("received container statuses on node manager register :"
          + request.getContainerStatuses());
      for (ContainerStatus containerStatus : request.getContainerStatuses()) {
    // Handle running applications reported by NM
    if (null != request.getRunningApplications()) {
      for (ApplicationId appId : request.getRunningApplications()) {
        handleRunningAppOnNode(appId, request.getNodeId());

> RM may not send the finished signal to some nodes where the application ran after RM
> ---------------------------------------------------------------------------------------------
>                 Key: YARN-1885
>                 URL: https://issues.apache.org/jira/browse/YARN-1885
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: Arpit Gupta
>            Assignee: Wangda Tan
>         Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch
> During our HA testing we have seen cases where yarn application logs are not available
through the cli but i can look at AM logs through the UI. RM was also being restarted in the
background as the application was running.

This message was sent by Atlassian JIRA

View raw message