Return-Path: X-Original-To: apmail-hadoop-common-dev-archive@www.apache.org Delivered-To: apmail-hadoop-common-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D81E2D858 for ; Mon, 26 Nov 2012 11:25:01 +0000 (UTC) Received: (qmail 45074 invoked by uid 500); 26 Nov 2012 11:24:59 -0000 Delivered-To: apmail-hadoop-common-dev-archive@hadoop.apache.org Received: (qmail 44892 invoked by uid 500); 26 Nov 2012 11:24:59 -0000 Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-dev@hadoop.apache.org Received: (qmail 44437 invoked by uid 99); 26 Nov 2012 11:24:58 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Nov 2012 11:24:58 +0000 Date: Mon, 26 Nov 2012 11:24:58 +0000 (UTC) From: "Harsh J (JIRA)" To: common-dev@hadoop.apache.org Message-ID: <893620875.22580.1353929098249.JavaMail.jiratomcat@arcas> In-Reply-To: <2008696125.22429.1353925378549.JavaMail.jiratomcat@arcas> Subject: [jira] [Resolved] (HADOOP-9091) Allow daemon startup when at least 1 (or configurable) disk is in an OK state. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-9091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-9091. ----------------------------- Resolution: Fixed This feature is already available in all our current releases via the DN volume failure toleration properties. Please see https://issues.apache.org/jira/browse/HDFS-1592. Resolving as not a problem. Please update to an inclusive release to have this addressed in your environment. > Allow daemon startup when at least 1 (or configurable) disk is in an OK state. > ------------------------------------------------------------------------------ > > Key: HADOOP-9091 > URL: https://issues.apache.org/jira/browse/HADOOP-9091 > Project: Hadoop Common > Issue Type: Improvement > Components: fs > Affects Versions: 0.20.2 > Reporter: Jelle Smet > Labels: features, hadoop > > The given example is if datanode disk definitions but should be applicable to all configuration where a list of disks are provided. > I have defined multiple local disks defined for a datanode: > > dfs.data.dir > /data/01/dfs/dn,/data/02/dfs/dn,/data/03/dfs/dn,/data/04/dfs/dn,/data/05/dfs/dn,/data/06/dfs/dn > true > > When one of those disks breaks and is unmounted then the mountpoint (such as /data/03 in this example) becomes a regular directory which doesn't have the valid permissions and possible directory structure Hadoop is expecting. > When this situation happens, the datanode fails to restart because of this while actually we have enough disks in an OK state to proceed. The only way around this is to alter the configuration and omit that specific disk configuration. > To my opinion, It would be more practical to let Hadoop daemons start when at least 1 disks/partition in the provided list is in a usable state. This prevents having to roll out custom configurations for systems which have temporarily a disk (and therefor directory layout) missing. This might also be configurable that at least X partitions out of he available ones are in OK state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira