[ https://issues.apache.org/jira/browse/HDFS-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769833#action_12769833 ] Christian Kunz commented on HDFS-732: ------------------------------------- Concerning the file in the first comment I found logs of 2 datanodes showing that indeed the block size shrunk from 18153472 to 17825792. 18153472 is not the correct size, but it is larger than 17825792, and I would argue, that a block should never be recovered by a block of smaller size. Logs from datanode receiving original block: 2009-10-23 19:46:47,934 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_6703874482275767879_76799972 src: /xxx.yyy.zzz.43:34608 dest: /xxx.yyy.zzz.43:uuu10 2009-10-23 21:15:59,694 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(xxx.yyy.zzz.43:uuu10, storageID=DS-243564233-xxx.yyy.zzz.43-uuu10-1254870555871, infoPort=50075, ipcPort=8020):Exception writing block blk_6703874482275767879_76799972 to mirror xxx.yyy.zzz.56:uuu10 2009-10-23 21:15:59,694 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_6703874482275767879_76799972 java.io.IOException: Connection reset by peer 2009-10-23 21:15:59,711 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder blk_6703874482275767879_76799972 2 Exception java.io.EOFException 2009-10-23 21:15:59,713 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder blk_6703874482275767879_76799972 2 : Thread is interrupted. 2009-10-23 21:15:59,713 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block blk_6703874482275767879_76799972 terminating 2009-10-23 21:15:59,713 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_6703874482275767879_76799972 received exception java.io.IOException: Connection reset by peer Logs from datanode next in the pipeline: 2009-10-23 19:46:48,174 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_6703874482275767879_76799972 src: /xxx.yyy.zzz.43:34609 dest: /xxx.yyy.zzz.56:uuu10 2009-10-23 21:15:59,661 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(xxx.yyy.zzz.56:uuu10, storageID=DS-807595239-72.30.217.12-50010-1203107050520, infoPort=50075, ipcPort=8020):Exception writing block blk_6703874482275767879_76799972 to mirror xxx.yyy.zzz.44:uuu10 2009-10-23 21:15:59,661 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_6703874482275767879_76799972 java.io.IOException: Connection reset by peer 2009-10-23 21:15:59,680 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder blk_6703874482275767879_76799972 1 Exception java.io.EOFException 2009-10-23 21:15:59,681 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder blk_6703874482275767879_76799972 1 : Thread is interrupted. 2009-10-23 21:15:59,681 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for block blk_6703874482275767879_76799972 terminating 2009-10-23 21:15:59,681 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_6703874482275767879_76799972 received exception java.io.IOException: Connection reset by peer 2009-10-23 21:16:00,069 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: oldblock=blk_6703874482275767879_76799972(length=18153472), newblock=blk_6703874482275767879_76840998(length=17825792), datanode=xxx.yyy.zzz.56:uuu10 2009-10-23 21:16:00,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_6703874482275767879_76840998 src: /xxx.yyy.zzz.43:36067 dest: /xxx.yyy.zzz.56:uuu10 2009-10-23 21:16:00,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Reopen already-open Block for append blk_6703874482275767879_76840998 2009-10-23 21:16:00,154 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Changing block file offset of block blk_6703874482275767879_76840998 from 0 to 17825792 meta file offset to 17415 2009-10-23 21:16:00,171 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(xxx.yyy.zzz.56:uuu10, storageID=DS-807595239-72.30.217.12-50010-1203107050520, infoPort=50075, ipcPort=8020):Exception writing block blk_6703874482275767879_76840998 to mirror xxx.yyy.zzz.44:uuu10 2009-10-23 21:16:00,171 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_6703874482275767879_76840998 java.io.IOException: Connection reset by peer 2009-10-23 21:16:00,400 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder blk_6703874482275767879_76840998 1 Exception java.nio.channels.ClosedByInterruptException 2009-10-23 21:16:00,417 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder blk_6703874482275767879_76840998 1 : Thread is interrupted. 2009-10-23 21:16:00,417 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for block blk_6703874482275767879_76840998 terminating 2009-10-23 21:16:00,417 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_6703874482275767879_76840998 received exception java.io.IOException: Connection reset by peer > HDFS files are ending up truncated > ---------------------------------- > > Key: HDFS-732 > URL: https://issues.apache.org/jira/browse/HDFS-732 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.20.1 > Reporter: Christian Kunz > > We recently started to use hadoop-0.20.1 in our production environment (less than 2 weeks ago) and already had 3 instances of truncated files, more than we had for months using hadoop-0.18.3. > Writing is done using libhdfs, although it rather seems to be a problem on the server side. > I will post some relevant logs (they are too large to be put into the description) > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.