hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo (Nicholas), SZE (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3657) HDFS writes get stuck trying to recoverBlock
Date Mon, 30 Jun 2008 23:56:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12609407#action_12609407
] 

Tsz Wo (Nicholas), SZE commented on HADOOP-3657:
------------------------------------------------

In the datanode logs, there are a lot of "IOException: Connection reset by peer". For example,
{noformat}
2008-06-30 23:13:17,848 WARN org.apache.hadoop.dfs.DataNode: DatanodeRegistration(xx.xx.xx.xx:50297,
storageID=DS-603925314-xx.xx.xx.xx-50297-1214866035317,
 infoPort=51131, ipcPort=50020):Got exception while serving blk_-590855607842175534_2046 to
/yy.yy.yy.yy:
java.io.IOException: Connection reset by peer
	at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
	at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:418)
	at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:519)
	at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:199)
	at org.apache.hadoop.dfs.DataNode$BlockSender.sendChunks(DataNode.java:1841)
	at org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1938)
	at org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1096)
	at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1024)
	at java.lang.Thread.run(Thread.java:619)
{noformat}

> HDFS writes get stuck trying to recoverBlock
> --------------------------------------------
>
>                 Key: HADOOP-3657
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3657
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Arun C Murthy
>
> A few reduces got stuck in a sort500 job with the following thread dump:
> {noformat}
> "main" prio=10 tid=0x0805b800 nid=0x1951 waiting for monitor entry [0xf7e6d000..0xf7e6e1f8]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>   at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2485)
>   - waiting to lock <0xe905e8f8> (a java.util.LinkedList)
>   - locked <0xe905e928> (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
>   at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:155)
>   at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
>   - locked <0xe905e928> (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
>   at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
>   - locked <0xe905e928> (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
>   at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:58)
>   - locked <0xe905e928> (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
>   at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:39)
>   at java.io.DataOutputStream.writeInt(DataOutputStream.java:181)
>   at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1014)
>   - locked <0xe90889e8> (a org.apache.hadoop.io.SequenceFile$Writer)
>   at org.apache.hadoop.mapred.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:70)
>   at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:298)
>   at org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:39)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:316)
>   at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2157)
> "DataStreamer for file /rw/out/_temporary/_attempt_200806261801_0006_r_000712_0/part-00712
block blk_-3923696991063961587_9628" daemon prio=10 tid=0x08413c00 nid=0x367a in Object.wait()
[0xd00e4000..0xd00e4f20]
>    java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:485)
>   at org.apache.hadoop.ipc.Client.call(Client.java:701)
>   - locked <0xf167d540> (a org.apache.hadoop.ipc.Client$Call)
>   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>   at org.apache.hadoop.dfs.$Proxy2.recoverBlock(Unknown Source)
>   at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2186)
>   at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1737)
>   at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1891)
>   - locked <0xe905e8f8> (a java.util.LinkedList)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message