hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-7203) Concurrent appending to the same file can cause data corruption
Date Tue, 07 Oct 2014 20:34:33 GMT
Kihwal Lee created HDFS-7203:

             Summary: Concurrent appending to the same file can cause data corruption
                 Key: HDFS-7203
                 URL: https://issues.apache.org/jira/browse/HDFS-7203
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: Kihwal Lee
            Priority: Blocker

When multiple threads are calling append against the same file, the file can get corrupt.
The root of the problem is that a stale file stat may be used for append in {{DFSClient}}.
If the file size changes between {{getFileStatus()}} and {{namenode.append()}}, {{DataStreamer}}
will get confused about how to align data to the checksum boundary and break the assumption
made by data nodes.  

When it happens, datanode may not write the last checksum. On the next append attempt, datanode
won't be able to reposition for the partial chunk, since the last checksum is missing. The
append will fail after running out of data nodes to copy the partial block to.

However, if there are more threads that try to append, this leads to a more serious situation.
 In a few minutes, a lease recovery and block recovery will happen.  The block recovery truncates
the block to the ack'ed size in order to make sure to keep only the portion of data that is
checksum-verified.  The problem is, during the last successful append, the last data node
verified the checksum and ack'ed before writing data and wrong metadata to the disk and all
data nodes in the pipeline wrote the same wrong metadata.  So the ack'ed size contains the
corrupt portion of the data.

Since block recovery does not perform any checksum verification, the file sizes are adjusted
and after {{commitBlockSynchronization()}}, another thread will be allowed to append to the
corrupt file.  This latent corruption may not be detected for a very long time.

The first failing {{append()}} would have created a partial copy of the block in the temporary
directory of every data node in the cluster. After this failure, it is likely under replicated,
so the file will be scheduled for replication after being closed. Before HDFS-6948, replication
didn't work until a node is added or restarted because of the temporary file being on all
data nodes. As a result, the corruption could not be detected by replication. After HDFS-6948,
the corruption will be detected after the file is closed by lease recovery or subsequent append-close.

This message was sent by Atlassian JIRA

View raw message