hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "patrick white (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-68) NodeManager will refuse to shutdown indefinitely due to container log aggregation
Date Fri, 31 Aug 2012 18:21:07 GMT
patrick white created YARN-68:

             Summary: NodeManager will refuse to shutdown indefinitely due to container log
                 Key: YARN-68
                 URL: https://issues.apache.org/jira/browse/YARN-68
             Project: Hadoop YARN
          Issue Type: Bug
          Components: nodemanager
    Affects Versions: 0.23.3
         Environment: QE
            Reporter: patrick white

The nodemanager is able to get into a state where containermanager.logaggregation.AppLogAggregatorImpl
will apparently wait
indefinitely for log aggregation to complete for an application, even if that application
has abnormally terminated and is no longer present. 

Observed behavior is that an attempt to stop the nodemanager daemon will return but have no
effect, the nm log continually displays messages similar to this:

[Thread-1]2012-08-21 17:44:07,581 INFO
Waiting for aggregation to complete for application_1345221477405_2733

The only recovery we found to work was to 'kill -9' the nm process.

What exactly causes the NM to enter this state is unclear but we do see this behavior reliably
when the NM has run a task which failed, for example when debugging oozie distcp actions and
having a distcp map task fail, the NM that was running the container will now enter this state
where a shutdown on said NM will never complete, 'never' in this case was waiting for 2 hours
before killing the nodemanager process.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message