hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Kunz (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-732) HDFS files are ending up truncated
Date Sun, 25 Oct 2009 01:16:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769743#action_12769743
] 

Christian Kunz commented on HDFS-732:
-------------------------------------

The file size of the file should have been 18654752, computed by adding up all the bytes successfully
written. But it ended up having only 17825792. I verified that the missing data was at the
end, i.e. the block got somehow recovered from an incomplete replica.

Here are relevant log messages from the namenode:

grep blk_6703874482275767879 hadoop-user-namenode-host.log.2009-10-23

2009-10-23 19:46:47,815 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock:
dir/output/app/_temporary/_attempt_200910221954_0001_r_009110_0/9110/filename.
blk_6703874482275767879_76799972
2009-10-23 21:16:00,028 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_6703874482275767879_76799972, newgenerationstamp=76840998,
newlength=17825792,
newtargets=[xxx.yyy.zzz.56:uuu10, xxx.yyy.zzz.44:uuu10], closeFile=false, deleteBlock=false)
2009-10-23 21:16:00,028 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(blk_6703874482275767879_76840998) successful
2009-10-23 21:16:00,397 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_6703874482275767879_76840998, newgenerationstamp=76840999,
newlength=17825792,
newtargets=[xxx.yyy.zzz.44:uuu10], closeFile=false, deleteBlock=false)
2009-10-23 21:16:00,397 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(blk_6703874482275767879_76840999) successful
2009-10-23 22:16:02,159 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* blk_6703874482275767879_76840999
recovery
started, primary=xxx.yyy.zzz.44:uuu10
2009-10-23 22:16:02,925 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
addStoredBlock
request received for blk_6703874482275767879_76888761 on xxx.yyy.zzz.44:uuu10 size 17825792
But it does not belong to any
file.
2009-10-23 22:16:02,925 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_6703874482275767879_76840999, newgenerationstamp=76888761,
newlength=17825792,
newtargets=[xxx.yyy.zzz.44:uuu10], closeFile=true, deleteBlock=false)
2009-10-23 22:16:02,931 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(newblock=blk_6703874482275767879_76888761,
file=dir/output/app/9110/filename,
newgenerationstamp=76888761, newlength=17825792, newtargets=[xxx.yyy.zzz.44:uuu10]) successful
2009-10-23 22:16:05,112 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask xxx.yyy.zzz.44:uuu10
to replicate
blk_6703874482275767879_76888761 to datanode(s) xxx.yyy.zzz.51:uuu10 xxx.yyy.zzz.237:uuu10
2009-10-23 22:16:06,310 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
blockMap updated:
xxx.yyy.zzz.51:uuu10 is added to blk_6703874482275767879_76888761 size 17825792
2009-10-23 22:16:06,312 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
blockMap updated:
xxx.yyy.zzz.237:uuu10 is added to blk_6703874482275767879_76888761 size 17825792


Relevant log messages of the task (the block went through recovery):

09/10/23 21:15:59 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block
blk_6703874482275767879_76799972java.io.IOException: Connection reset by peer
09/10/23 21:15:59 WARN hdfs.DFSClient: Error Recovery for block blk_6703874482275767879_76799972
bad datanode[0]
xxx.yyy.zzz.43:uuu10
09/10/23 21:15:59 WARN hdfs.DFSClient: Error Recovery for block blk_6703874482275767879_76799972
in pipeline
xxx.yyy.zzz.43:uuu10, xxx.yyy.zzz.56:uuu10, xxx.yyy.zzz.44:uuu10: bad datanode xxx.yyy.zzz.43:uuu10
09/10/23 21:16:00 WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Connection
reset by peer
09/10/23 21:16:00 WARN hdfs.DFSClient: Error Recovery for block blk_6703874482275767879_76840998
bad datanode[0]
xxx.yyy.zzz.56:uuu10
09/10/23 21:16:00 WARN hdfs.DFSClient: Error Recovery for block blk_6703874482275767879_76840998
in pipeline
xxx.yyy.zzz.56:uuu10, xxx.yyy.zzz.44:uuu10: bad datanode xxx.yyy.zzz.56:uuu10
09/10/23 21:16:00 WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Connection
reset by peer
09/10/23 21:16:00 WARN hdfs.DFSClient: Error Recovery for block blk_6703874482275767879_76840999
bad datanode[0]
xxx.yyy.zzz.44:uuu10
Exception in thread "main" java.io.IOException: All datanodes xxx.yyy.zzz.44:uuu10 are bad.
Aborting..

First attempt to close the file was unsuccessful, but second attempt was successful (but with
truncated size).



> HDFS files are ending up truncated
> ----------------------------------
>
>                 Key: HDFS-732
>                 URL: https://issues.apache.org/jira/browse/HDFS-732
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20.1
>            Reporter: Christian Kunz
>
> We recently started to use hadoop-0.20.1 in our production environment (less than 2 weeks
ago) and already had 3 instances of truncated files, more than we had for months using hadoop-0.18.3.
> Writing is done using libhdfs, although it rather seems to be a problem on the server
side.
> I will post some relevant logs (they are too large to be put into the description)
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message