hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-4688) DFSClient should not allow multiple concurrent creates for the same file
Date Thu, 11 Apr 2013 19:20:14 GMT
Andrew Wang created HDFS-4688:
---------------------------------

             Summary: DFSClient should not allow multiple concurrent creates for the same
file
                 Key: HDFS-4688
                 URL: https://issues.apache.org/jira/browse/HDFS-4688
             Project: Hadoop HDFS
          Issue Type: Bug
    Affects Versions: 2.0.3-alpha, 3.0.0
            Reporter: Andrew Wang
            Assignee: Andrew Wang


Credit to Harsh for tracing down most of this.

If a DFSClient does create with overwrite multiple times on the same file, we can get into
bad states. The exact failure mode depends on the state of the file, but at the least one
DFSOutputStream will "win" over the others, leading to data loss in the sense that data written
to the other DFSOutputStreams will be lost. While this is perhaps okay because of overwrite
semantics, we've also seen other cases where the DFSClient loops indefinitely on close and
blocks get marked as corrupt. This is not okay.

One fix for this is adding some locking to DFSClient which prevents a user from opening multiple
concurrent output streams to the same path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message