Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id DBC6E200C3A for ; Fri, 17 Mar 2017 01:43:45 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id D89FC160B8B; Fri, 17 Mar 2017 00:43:45 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 37726160B78 for ; Fri, 17 Mar 2017 01:43:45 +0100 (CET) Received: (qmail 12524 invoked by uid 500); 17 Mar 2017 00:43:44 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 12513 invoked by uid 99); 17 Mar 2017 00:43:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Mar 2017 00:43:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id CB40018F164 for ; Fri, 17 Mar 2017 00:43:43 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.651 X-Spam-Level: X-Spam-Status: No, score=0.651 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id LmPwbMUYmqfx for ; Fri, 17 Mar 2017 00:43:42 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 9E32E5FD84 for ; Fri, 17 Mar 2017 00:43:42 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 20134E0538 for ; Fri, 17 Mar 2017 00:43:42 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id AE751243A6 for ; Fri, 17 Mar 2017 00:43:41 +0000 (UTC) Date: Fri, 17 Mar 2017 00:43:41 +0000 (UTC) From: "Anu Engineer (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 17 Mar 2017 00:43:46 -0000 [ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15929244#comment-15929244 ] Anu Engineer commented on HDFS-4015: ------------------------------------ [~danielpol] Thanks for reporting this. I will try to repro the case you have described. But just to make sure that we are on the same page, this patch addresses the issue of when NN is in safe mode. So is your case when you have Datanode down, you delete the directory and then you reboot the datanodes *and* namenode ? Can you please explain the steps to repro this issue ? Thanks in advance. > Safemode should count and report orphaned blocks > ------------------------------------------------ > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Affects Versions: 3.0.0-alpha1 > Reporter: Todd Lipcon > Assignee: Anu Engineer > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, HDFS-4015.006.patch, HDFS-4015.007.patch > > > The safemode status currently reports the number of unique reported blocks compared to the total number of blocks referenced by the namespace. However, it does not report the inverse: blocks which are reported by datanodes but not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can be confusing: safemode and fsck will show "corrupt files", which are the files which actually have been deleted but got resurrected by restarting from the old image. This will convince them that they can safely force leave safemode and remove these files -- after all, they know that those files should really have been deleted. However, they're not aware that leaving safemode will also unrecoverably delete a bunch of other block files which have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "900000 of expected 1000000 blocks have been reported. Additionally, 10000 blocks have been reported which do not correspond to any file in the namespace. Forcing exit of safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is the logical next step, but just reporting it as a warning seems easy enough to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org