hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3364) TestFileAppend4.testRecoverFinalizedBlock occasionally times out
Date Wed, 04 Jul 2012 08:14:35 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406354#comment-13406354
] 

Junping Du commented on HDFS-3364:
----------------------------------

I saw this error in pre-commit test for HADOOP-8472, and also cannot reproduce in local. The
full log is https://builds.apache.org/job/PreCommit-HADOOP-Build/1155//testReport/org.apache.hadoop.hdfs/TestFileAppend4/testCompleteOtherLeaseHoldersFile/
and the relevant log is as following:
{noformat}
2012-06-29 16:26:22,187 INFO  ipc.Server (Server.java:stop(1991)) - Stopping server on 46047
2012-06-29 16:26:22,424 DEBUG datanode.DataNode (BPServiceActor.java:sendHeartBeat(431)) -
Sending heartbeat from service actor: Block pool BP-166940465-67.195.138.20-1340987178877
(storage id DS-851929818-67.195.138.20-36687-1340987179189) service to localhost/127.0.0.1:49801
2012-06-29 16:26:22,424 DEBUG datanode.DataNode (BPServiceActor.java:sendHeartBeat(431)) -
Sending heartbeat from service actor: Block pool BP-166940465-67.195.138.20-1340987178877
(storage id DS-1404243958-67.195.138.20-36424-1340987179356) service to localhost/127.0.0.1:49801
2012-06-29 16:26:22,424 DEBUG datanode.DataNode (BPServiceActor.java:sendHeartBeat(431)) -
Sending heartbeat from service actor: Block pool BP-166940465-67.195.138.20-1340987178877
(storage id DS-1672148922-67.195.138.20-53642-1340987179269) service to localhost/127.0.0.1:49801
2012-06-29 16:26:22,425 INFO  ipc.Server (Server.java:run(638)) - Stopping IPC Server listener
on 46047
2012-06-29 16:26:22,425 INFO  ipc.Server (Server.java:run(780)) - Stopping IPC Server Responder
2012-06-29 16:26:22,425 INFO  datanode.DataNode (DataNode.java:shutdown(1068)) - Waiting for
threadgroup to exit, active threads is 1
2012-06-29 16:26:22,428 WARN  ipc.Server (Server.java:processResponse(979)) - IPC Server Responder,
call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 127.0.0.1:51257:
output error
2012-06-29 16:26:22,429 INFO  ipc.Server (Server.java:run(1745)) - IPC Server handler 9 on
49801 caught an exception
java.nio.channels.ClosedChannelException
	at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
	at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2131)
	at org.apache.hadoop.ipc.Server.access$2000(Server.java:107)
	at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:930)
	at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:994)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1738)
{noformat}

>From this log, the problem happens in shutting down 3rd datanodes as previous 2 datanodes
are shut down successfully. It looks like when shutting down ipcserver on 3rd datanode, it
happens to do heartbeat for BPServiceActor in offerService() so some heartbeat exceptions
are thrown out.
Also, the following code which is for interrupt offerService() is not taking effective which
means offerService() pending on somewhere else? like heartbeat response or pendingIncrementalBR.
Does adding catch InterruptedException(...) in whole offerService() can help here? 

{code}
  synchronized(pendingIncrementalBR) {
    if (waitTime > 0 && pendingReceivedRequests == 0) {
      try {
        pendingIncrementalBR.wait(waitTime);
      } catch (InterruptedException ie) {
        LOG.warn("BPOfferService for " + this + " interrupted");
      }
    }
  }
{code}
                
> TestFileAppend4.testRecoverFinalizedBlock occasionally times out
> ----------------------------------------------------------------
>
>                 Key: HDFS-3364
>                 URL: https://issues.apache.org/jira/browse/HDFS-3364
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 2.0.0-alpha
>            Reporter: Eli Collins
>
> I've seen TestFileAppend4.testRecoverFinalizedBlock shutdown occasionally time out in
jenkins. Doesn't fail for me locally.
> {noformat}
> test timed out after 60000 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 60000 milliseconds
> 	at java.lang.Object.wait(Native Method)
> 	at java.lang.Thread.join(Thread.java:1186)
> 	at java.lang.Thread.join(Thread.java:1239)
> 	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.join(BPServiceActor.java:473)
> 	at org.apache.hadoop.hdfs.server.datanode.BPOfferService.join(BPOfferService.java:259)
> 	at org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.shutDownAll(BlockPoolManager.java:117)
> 	at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1098)
> 	at org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1280)
> 	at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1260)
> 	at org.apache.hadoop.hdfs.TestFileAppend4.testRecoverFinalizedBlock(TestFileAppend4.java:208)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message