hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3058) Sometimes task keeps on running while its Syslog says that it is shutdown
Date Fri, 21 Oct 2011 15:04:32 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132730#comment-13132730
] 

Hadoop QA commented on MAPREDUCE-3058:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12500186/MAPREDUCE-3058-20111021.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit
warnings.

    +1 core tests.  The patch passed unit tests in .

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1102//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1102//artifact/trunk/hadoop-mapreduce-project/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1102//artifact/trunk/hadoop-mapreduce-project/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1102//artifact/trunk/hadoop-mapreduce-project/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1102//console

This message is automatically generated.
                
> Sometimes task keeps on running while its Syslog says that it is shutdown
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3058
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3058
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/gridmix, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Karam Singh
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Critical
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3058-20110923.txt, MAPREDUCE-3058-20111021.txt, MR-3058.wip.patch
>
>
> While running GridMixV3, one of the jobs got stuck for 15 hrs. After clicking on the
Job-page, found one of its reduces to be stuck. Looking at syslog of the stuck reducer, found
this:
> Task-logs' head:
> {code}
> 2011-09-19 17:57:22,002 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled
snapshot period at 10 second(s).
> 2011-09-19 17:57:22,002 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask
metrics system started
> {code}
> Task-logs' tail:
> {code}
> 2011-09-19 18:06:49,818 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink as <DATANODE1>
> 2011-09-19 18:06:49,818 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block
BP-1405370709-<NAMENODE>-1316452621953:blk_-7004355226367468317_79871 in pipeline  <DATANODE2>,
 <DATANODE1>: bad datanode  <DATANODE1>
> 2011-09-19 18:06:49,818 DEBUG org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtocol:
lastAckedSeqno = 26870
> 2011-09-19 18:06:49,820 DEBUG org.apache.hadoop.ipc.Client: IPC Client (26613121) connection
to <NAMENODE> from gridperf sending #454
> 2011-09-19 18:06:49,826 DEBUG org.apache.hadoop.ipc.Client: IPC Client (26613121) connection
to <<NAMENODE> from gridperf got value #454
> 2011-09-19 18:06:49,827 DEBUG org.apache.hadoop.ipc.RPC: Call: getAdditionalDatanode
8
> 2011-09-19 18:06:49,827 DEBUG org.apache.hadoop.hdfs.DFSClient: Connecting to datanode
<DATANODE2>
> 2011-09-19 18:06:49,827 DEBUG org.apache.hadoop.hdfs.DFSClient: Send buf size 131071
> 2011-09-19 18:06:49,833 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception
> java.io.EOFException: Premature EOF: no length prefix available
>         at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:158)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:860)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:838)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:929)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:740)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:415)
> 2011-09-19 18:06:49,837 WARN org.apache.hadoop.mapred.YarnChild: Exception running child
: java.io.EOFException: Premature EOF: no length prefix available
>         at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:158)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:860)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:838)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:929)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:740)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:415)
> 2011-09-19 18:06:49,837 DEBUG org.apache.hadoop.ipc.Client: IPC Client (26613121) connection
to <APPMASTER> from job_1316452677984_0862 sending #455
> 2011-09-19 18:06:49,839 DEBUG org.apache.hadoop.ipc.Client: IPC Client (26613121) connection
to <APPMASTER> from job_1316452677984_0862 got value #455
> 2011-09-19 18:06:49,840 DEBUG org.apache.hadoop.ipc.RPC: Call: statusUpdate 3
> 2011-09-19 18:06:49,840 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the
task
> 2011-09-19 18:06:49,840 DEBUG org.apache.hadoop.ipc.Client: IPC Client (26613121) connection
to <NAMENODE> from gridperf sending #456
> 2011-09-19 18:06:49,858 DEBUG org.apache.hadoop.ipc.Client: IPC Client (26613121) connection
to <NAMENODE> from gridperf got value #456
> 2011-09-19 18:06:49,858 DEBUG org.apache.hadoop.ipc.RPC: Call: delete 18
> 2011-09-19 18:06:49,858 DEBUG org.apache.hadoop.ipc.Client: IPC Client (26613121) connection
to <APPMASTER> from job_1316452677984_0862 sending #457
> 2011-09-19 18:06:49,859 DEBUG org.apache.hadoop.ipc.Client: IPC Client (26613121) connection
to <APPMASTER> from job_1316452677984_0862 got value #457
> 2011-09-19 18:06:49,859 DEBUG org.apache.hadoop.ipc.RPC: Call: reportDiagnosticInfo 1
> 2011-09-19 18:06:49,859 DEBUG org.apache.hadoop.metrics2.impl.MetricsSystemImpl: refCount=1
> 2011-09-19 18:06:49,859 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping
ReduceTask metrics system...
> 2011-09-19 18:06:49,859 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping
metrics source UgiMetrics
> 2011-09-19 18:06:49,859 DEBUG org.apache.hadoop.metrics2.impl.MetricsSystemImpl: class
org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1
> 2011-09-19 18:06:49,860 DEBUG org.apache.hadoop.metrics2.util.MBeans: Unregistering Hadoop:service=ReduceTask,name=UgiMetrics
> 2011-09-19 18:06:49,860 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping
metrics source JvmMetrics
> 2011-09-19 18:06:49,860 DEBUG org.apache.hadoop.metrics2.impl.MetricsSystemImpl: class
org.apache.hadoop.metrics2.source.JvmMetrics
> 2011-09-19 18:06:49,860 DEBUG org.apache.hadoop.metrics2.util.MBeans: Unregistering Hadoop:service=ReduceTask,name=JvmMetrics
> 2011-09-19 18:06:49,860 DEBUG org.apache.hadoop.metrics2.util.MBeans: Unregistering Hadoop:service=ReduceTask,name=MetricsSystem,sub=Stats
> 2011-09-19 18:06:49,860 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask
metrics system stopped.
> 2011-09-19 18:06:49,860 DEBUG org.apache.hadoop.metrics2.util.MBeans: Unregistering Hadoop:service=ReduceTask,name=MetricsSystem,sub=Control
> 2011-09-19 18:06:49,860 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask
metrics system shutdown complete.
> {code}
> Which means that tasks is supposed to have stopped within 20 secs, whereas the process
itself is stuck for more than 15 hours. From AM log, also found that this task was sending
its update regularly. ps -ef | grep java was also showing that process is still alive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message