hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-68) NodeManager will refuse to shutdown indefinitely due to container log aggregation
Date Fri, 31 Aug 2012 21:06:10 GMT

    [ https://issues.apache.org/jira/browse/YARN-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446379#comment-13446379

Hadoop QA commented on YARN-68:

+1 overall.  Here are the results of testing the latest attachment 
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 1 new or modified test files.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit

    +1 core tests.  The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-YARN-Build/14//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/14//console

This message is automatically generated.
> NodeManager will refuse to shutdown indefinitely due to container log aggregation
> ---------------------------------------------------------------------------------
>                 Key: YARN-68
>                 URL: https://issues.apache.org/jira/browse/YARN-68
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 0.23.3
>         Environment: QE
>            Reporter: patrick white
>            Assignee: Daryn Sharp
>         Attachments: YARN-68-1.patch, YARN-68.patch
> The nodemanager is able to get into a state where containermanager.logaggregation.AppLogAggregatorImpl
will apparently wait
> indefinitely for log aggregation to complete for an application, even if that application
has abnormally terminated and is no longer present. 
> Observed behavior is that an attempt to stop the nodemanager daemon will return but have
no effect, the nm log continually displays messages similar to this:
> [Thread-1]2012-08-21 17:44:07,581 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
> Waiting for aggregation to complete for application_1345221477405_2733
> The only recovery we found to work was to 'kill -9' the nm process.
> What exactly causes the NM to enter this state is unclear but we do see this behavior
reliably when the NM has run a task which failed, for example when debugging oozie distcp
actions and having a distcp map task fail, the NM that was running the container will now
enter this state where a shutdown on said NM will never complete, 'never' in this case was
waiting for 2 hours before killing the nodemanager process.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message