hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thanh Do (JIRA)" <j...@apache.org>
Subject [jira] Created: (HDFS-1232) Corrupted block if a crash happens before writing to checksumOut but after writing to dataOut
Date Thu, 17 Jun 2010 05:32:23 GMT
Corrupted block if a crash happens before writing to checksumOut but after writing to dataOut
---------------------------------------------------------------------------------------------

                 Key: HDFS-1232
                 URL: https://issues.apache.org/jira/browse/HDFS-1232
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: data-node
    Affects Versions: 0.20.1
            Reporter: Thanh Do


- Summary: block is corrupted if a crash happens before writing to checksumOut but
after writing to dataOut. 
 
- Setup:
+ # available datanodes = 1
+ # disks / datanode = 1
+ # failures = 1
+ failure type = crash
+When/where failure happens = (see below)
 
- Details:
The order of processing a packet during client write/append at datanode
is first forward the packet to downstream, then write to data the block file, and 
and finally, write to the checksum file. Hence if a crash happens BEFORE the write
to checksum file but AFTER the write to data file, the block is corrupted.
Worse, if this is the only available replica, the block is lost.
 
We also found this problem in case there are 3 replicas for a particular block,
and during append, there are two failures. (see HDFS-1231)

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and 
Haryadi Gunawi (haryadi@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message