hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
Date Fri, 24 Feb 2012 02:43:49 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215325#comment-13215325

Hudson commented on MAPREDUCE-3738:

Integrated in Hadoop-Hdfs-0.23-Commit #574 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/574/])
    merge MAPREDUCE-3738 from trunk (Revision 1293061)

     Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1293061
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java

> NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
> ----------------------------------------------------------------------------
>                 Key: MAPREDUCE-3738
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 0.23.2
>         Attachments: MAPREDUCE-3738.patch, livehistdump.txt
> If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception like OutOfMemoryError
in the case I saw) then this will lead to a hang during nodemanager shutdown.  The NM calls
AppLogAggregatorImpl.join() during shutdown to make sure log aggregation has completed, and
that method internally waits for an atomic boolean to be set by the log aggregation thread
to indicate it has finished.  Since the thread was killed off earlier due to an uncaught exception,
the boolean will never be set and the NM hangs during shutdown repeating something like this
every second in the log file:
> 2012-01-25 22:20:56,366 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
Waiting for aggregation to complete for application_1326848182580_2806

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message