Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 07EA410966 for ; Mon, 5 Aug 2013 17:50:50 +0000 (UTC) Received: (qmail 35297 invoked by uid 500); 5 Aug 2013 17:50:49 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 35263 invoked by uid 500); 5 Aug 2013 17:50:49 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 34984 invoked by uid 99); 5 Aug 2013 17:50:49 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Aug 2013 17:50:49 +0000 Date: Mon, 5 Aug 2013 17:50:49 +0000 (UTC) From: "Konstantin Shvachko (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-2994) If lease is recovered successfully inline with create, create can fail MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13729713#comment-13729713 ] Konstantin Shvachko commented on HDFS-2994: ------------------------------------------- Looks like the problem is still there. In case of opening for append if softLimit expired recoverLeaseInternal() may finalize file and replace myFile with the closed one. Then prepareFileForWrite() will try to replace the same file again, which will fail because myFile is an outdated / invalid reference to the old indode. The right fix is to refresh myFile after recoverLeaseInternal() rather than setting its parent field as proposed in attached patch. > If lease is recovered successfully inline with create, create can fail > ---------------------------------------------------------------------- > > Key: HDFS-2994 > URL: https://issues.apache.org/jira/browse/HDFS-2994 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 0.24.0 > Reporter: Todd Lipcon > Assignee: amith > Attachments: HDFS-2994_1.patch, HDFS-2994_1.patch > > > I saw the following logs on my test cluster: > {code} > 2012-02-22 14:35:22,887 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease [Lease. Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1 > 2012-02-22 14:35:22,887 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 > 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All existing blocks are COMPLETE, lease removed, file closed. > 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.replaceNode: failed to remove /benchmarks/TestDFSIO/io_data/test_io_6 > 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.startFile: FSDirectory.replaceNode: failed to remove /benchmarks/TestDFSIO/io_data/test_io_6 > {code} > It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, then the INode will be replaced with a new one, meaning the later {{replaceNode}} call can fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira