Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0255510D9B for ; Tue, 4 Feb 2014 05:28:23 +0000 (UTC) Received: (qmail 64148 invoked by uid 500); 4 Feb 2014 05:28:20 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 64084 invoked by uid 500); 4 Feb 2014 05:28:18 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 64067 invoked by uid 99); 4 Feb 2014 05:28:18 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Feb 2014 05:28:18 +0000 Date: Tue, 4 Feb 2014 05:28:18 +0000 (UTC) From: "Hadoop QA (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-4239) Means of telling the datanode to stop using a sick disk MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890401#comment-13890401 ] Hadoop QA commented on HDFS-4239: --------------------------------- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626799/hdfs-4239_v4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestReplaceDatanodeOnFailure org.apache.hadoop.hdfs.server.namenode.TestAuditLogs {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6020//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6020//console This message is automatically generated. > Means of telling the datanode to stop using a sick disk > ------------------------------------------------------- > > Key: HDFS-4239 > URL: https://issues.apache.org/jira/browse/HDFS-4239 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: stack > Assignee: Jimmy Xiang > Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, hdfs-4239_v4.patch > > > If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing occasionally, or just exhibiting high latency -- your choices are: > 1. Decommission the total datanode. If the datanode is carrying 6 or 12 disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- the rereplication of the downed datanode's data can be pretty disruptive, especially if the cluster is doing low latency serving: e.g. hosting an hbase cluster. > 2. Stop the datanode, unmount the bad disk, and restart the datanode (You can't unmount the disk while it is in use). This latter is better in that only the bad disk's data is rereplicated, not all datanode data. > Is it possible to do better, say, send the datanode a signal to tell it stop using a disk an operator has designated 'bad'. This would be like option #2 above minus the need to stop and restart the datanode. Ideally the disk would become unmountable after a while. > Nice to have would be being able to tell the datanode to restart using a disk after its been replaced. -- This message was sent by Atlassian JIRA (v6.1.5#6160)