Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1204B191CE for ; Mon, 11 Apr 2016 07:11:26 +0000 (UTC) Received: (qmail 92493 invoked by uid 500); 11 Apr 2016 07:11:25 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 92436 invoked by uid 500); 11 Apr 2016 07:11:25 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 92422 invoked by uid 99); 11 Apr 2016 07:11:25 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Apr 2016 07:11:25 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 8B5252C1F62 for ; Mon, 11 Apr 2016 07:11:25 +0000 (UTC) Date: Mon, 11 Apr 2016 07:11:25 +0000 (UTC) From: "Brahma Reddy Battula (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-10269) Invalid value configured for dfs.datanode.failed.volumes.tolerated cause the datanode exit MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15234591#comment-15234591 ] Brahma Reddy Battula commented on HDFS-10269: --------------------------------------------- One Improvement we could do in this area is, Move {{volFailuresTolerated}} and its validation code from {{FSDataSetImpl.java}} to {{DNConf.java}}. With this, Miss-Configuration will be detected way earlier, in Main thread itself. Currently it throws back exception only during the initializing the storage which happens only after registration to any one of the NameNodes > Invalid value configured for dfs.datanode.failed.volumes.tolerated cause the datanode exit > ------------------------------------------------------------------------------------------ > > Key: HDFS-10269 > URL: https://issues.apache.org/jira/browse/HDFS-10269 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.7.1 > Reporter: Lin Yiqun > Assignee: Lin Yiqun > Attachments: HDFS-10269.001.patch > > > The datanode start failed and exited when I reused configured for dfs.datanode.failed.volumes.tolerated as 5 from my another cluster but actually the new cluster only have one datadir path. And this leaded the Invalid volume failure config value and threw {{DiskErrorException}}, so the datanode shutdown. The info is below: > {code} > 2016-04-07 09:34:45,358 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to add storage for block pool: BP-1239160341-xx.xx.xx.xx-1459929303126 : BlockPoolSliceStorage.recoverTransitionRead: attempt to load an used block storage: /home/data/hdfs/data/current/BP-1239160341-xx.xx.xx.xx-1459929303126 > 2016-04-07 09:34:45,358 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool (Datanode Uuid unassigned) service to /xx.xx.xx.xx:9000. Exiting. > java.io.IOException: All specified directories are failed to load. > at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:477) > at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1361) > at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1326) > at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:316) > at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223) > at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:801) > at java.lang.Thread.run(Thread.java:745) > 2016-04-07 09:34:45,358 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool (Datanode Uuid unassigned) service to /xx.xx.xx.xx:9000. Exiting. > org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid volume failure config value: 5 > at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.(FsDatasetImpl.java:281) > at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34) > at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30) > at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1374) > at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1326) > at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:316) > at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223) > at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:801) > at java.lang.Thread.run(Thread.java:745) > 2016-04-07 09:34:45,358 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool (Datanode Uuid unassigned) service to /xx.xx.xx.xx:9000 > 2016-04-07 09:34:45,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool (Datanode Uuid unassigned) service to /xx.xx.xx.xx:9000 > 2016-04-07 09:34:45,460 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool (Datanode Uuid unassigned) > 2016-04-07 09:34:47,460 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode > 2016-04-07 09:34:47,462 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0 > 2016-04-07 09:34:47,463 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: > {code} > IMO, this will let users feel bad because I only configured a value incorrectly. Instead of, we can give a warn info for this and reset this value to the default value. It will be a better way for this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)