Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8C998100D9 for ; Mon, 2 Feb 2015 22:59:57 +0000 (UTC) Received: (qmail 96734 invoked by uid 500); 2 Feb 2015 22:59:35 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 96676 invoked by uid 500); 2 Feb 2015 22:59:35 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 96664 invoked by uid 99); 2 Feb 2015 22:59:35 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Feb 2015 22:59:35 +0000 Date: Mon, 2 Feb 2015 22:59:35 +0000 (UTC) From: "Yongjun Zhang (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-7707) Edit log corruption due to delayed block removal again MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14302277#comment-14302277 ] Yongjun Zhang commented on HDFS-7707: ------------------------------------- Hi Kihwal, Inspired by your comment https://issues.apache.org/jira/browse/HDFS-7707?focusedCommentId=14299106&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14299106 I think I have a better solution now. That is, instead of checking the name string, check the inode id. Comparing the inode id of the deleted file/dir against a newly created inode id will mismatch, thus the detecting that the file/dir was deleted. Thanks. > Edit log corruption due to delayed block removal again > ------------------------------------------------------ > > Key: HDFS-7707 > URL: https://issues.apache.org/jira/browse/HDFS-7707 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.6.0 > Reporter: Yongjun Zhang > Assignee: Yongjun Zhang > Attachments: reproduceHDFS-7707.patch > > > Edit log corruption is seen again, even with the fix of HDFS-6825. > Prior to HDFS-6825 fix, if dirX is deleted recursively, an OP_CLOSE can get into edit log for the fileY under dirX, thus corrupting the edit log (restarting NN with the edit log would fail). > What HDFS-6825 does to fix this issue is, to detect whether fileY is already deleted by checking the ancestor dirs on it's path, if any of them doesn't exist, then fileY is already deleted, and don't put OP_CLOSE to edit log for the file. > For this new edit log corruption, what I found was, the client first deleted dirX recursively, then create another dir with exactly the same name as dirX right away. Because HDFS-6825 count on the namespace checking (whether dirX exists in its parent dir) to decide whether a file has been deleted, the newly created dirX defeats this checking, thus OP_CLOSE for the already deleted file gets into the edit log, due to delayed block removal. > What we need to do is to have a more robust way to detect whether a file has been deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)