Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 98E511050E for ; Fri, 12 Apr 2013 04:11:18 +0000 (UTC) Received: (qmail 88566 invoked by uid 500); 12 Apr 2013 04:11:18 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 88333 invoked by uid 500); 12 Apr 2013 04:11:18 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 88297 invoked by uid 99); 12 Apr 2013 04:11:17 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Apr 2013 04:11:17 +0000 Date: Fri, 12 Apr 2013 04:11:17 +0000 (UTC) From: "Fengdong Yu (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-4688) DFSClient should not allow multiple concurrent creates for the same file MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-4688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629767#comment-13629767 ] Fengdong Yu commented on HDFS-4688: ----------------------------------- I will add some test after a while. > DFSClient should not allow multiple concurrent creates for the same file > ------------------------------------------------------------------------ > > Key: HDFS-4688 > URL: https://issues.apache.org/jira/browse/HDFS-4688 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 3.0.0, 2.0.3-alpha > Reporter: Andrew Wang > Assignee: Andrew Wang > Attachments: HDFS-4688.txt, TestBadFileMaker.java > > > Credit to Harsh for tracing down most of this. > If a DFSClient does create with overwrite multiple times on the same file, we can get into bad states. The exact failure mode depends on the state of the file, but at the least one DFSOutputStream will "win" over the others, leading to data loss in the sense that data written to the other DFSOutputStreams will be lost. While this is perhaps okay because of overwrite semantics, we've also seen other cases where the DFSClient loops indefinitely on close and blocks get marked as corrupt. This is not okay. > One fix for this is adding some locking to DFSClient which prevents a user from opening multiple concurrent output streams to the same path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira