Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 29BBC200AEE for ; Tue, 3 May 2016 23:26:15 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 287FB1609A9; Tue, 3 May 2016 23:26:15 +0200 (CEST) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7320F1609F4 for ; Tue, 3 May 2016 23:26:14 +0200 (CEST) Received: (qmail 28894 invoked by uid 500); 3 May 2016 21:26:13 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 28556 invoked by uid 99); 3 May 2016 21:26:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 May 2016 21:26:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 246A92C14F9 for ; Tue, 3 May 2016 21:26:13 +0000 (UTC) Date: Tue, 3 May 2016 21:26:13 +0000 (UTC) From: "Wei-Chiu Chuang (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-10360) DataNode may format directory and lose blocks if If current/VERSION is missing MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 03 May 2016 21:26:15 -0000 [ https://issues.apache.org/jira/browse/HDFS-10360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-10360: ----------------------------------- Attachment: HDFS-10360.001.patch Upload a proof of concept for the fix proposed in #1. > DataNode may format directory and lose blocks if If current/VERSION is missing > ------------------------------------------------------------------------------ > > Key: HDFS-10360 > URL: https://issues.apache.org/jira/browse/HDFS-10360 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Reporter: Wei-Chiu Chuang > Assignee: Wei-Chiu Chuang > Attachments: HDFS-10360.001.patch > > > Under certain circumstances, if the current/VERSION of a storage directory is missing, DataNode may format the storage directory even though _block files are not missing_. > This is very easy to reproduce. Simply launch a HDFS cluster and create some files. Delete current/VERSION, and restart the data node. > After the restart, the data node will format the directory and remove all existing block files: > {noformat} > 2016-05-03 12:57:15,387 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/dfs/dn/in_use.lock acquired by nodename 5314@weichiu-dn-2.vpc.cloudera.com > 2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /data/dfs/dn is not formatted for BP-787466439-172.26.24.43-1462305406642 > 2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting ... > 2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-787466439-172.26.24.43-1462305406642 > 2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642 > 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: Block pool storage directory /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642 is not formatted for BP-787466439-172 > .26.24.43-1462305406642 > 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting ... > 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting block pool BP-787466439-172.26.24.43-1462305406642 directory /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642/current > {noformat} > The bug is: DataNode assumes that if none of {{current/VERSION}}, {{previous/}}, {{previous.tmp/}}, {{removed.tmp/}}, {{finalized.tmp/}} and {{lastcheckpoint.tmp/}} exists, the storage directory contains nothing important to HDFS and decides to format it. However, block files may still exist, and in my opinion, we should do everything possible to retain the block files. > I have two suggestions: > # check if {{current/}} directory is empty. If not, throw an InconsistentFSStateException in {{Storage#analyzeStorage}} instead of asumming its not formatted. Or, > # In {{Storage#clearDirectory}}, before it formats the storage directory, rename or move {{current/}} directory. Also, log whatever is being renamed/moved. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org