Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 258202794 for ; Thu, 5 May 2011 15:43:45 +0000 (UTC) Received: (qmail 69592 invoked by uid 500); 5 May 2011 15:43:45 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 69569 invoked by uid 500); 5 May 2011 15:43:45 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 69560 invoked by uid 99); 5 May 2011 15:43:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 May 2011 15:43:44 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 May 2011 15:43:42 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 535BBC2C8A for ; Thu, 5 May 2011 15:43:03 +0000 (UTC) Date: Thu, 5 May 2011 15:43:03 +0000 (UTC) From: "Hadoop QA (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <324726549.24840.1304610183338.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HDFS-664) Add a way to efficiently replace a disk in a live datanode MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HDFS-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029384#comment-13029384 ] Hadoop QA commented on HDFS-664: -------------------------------- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12478285/HDFS-664.0-20-3-rc2.patch.1 against trunk revision 1099641. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/458//console This message is automatically generated. > Add a way to efficiently replace a disk in a live datanode > ---------------------------------------------------------- > > Key: HDFS-664 > URL: https://issues.apache.org/jira/browse/HDFS-664 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node > Affects Versions: 0.22.0 > Reporter: Steve Loughran > Attachments: HDFS-664.0-20-3-rc2.patch.1, HDFS-664.patch > > > In clusters where the datanode disks are hot swappable, you need to be able to swap out a disk on a live datanode without taking down the datanode. You don't want to decommission the whole node as that is overkill. on a system with 4 1TB HDDs, giving 3 TB of datanode storage, a decommissioning and restart will consume up to 6 TB of bandwidth. If a single disk were swapped in then there would only be 1TB of data to recover over the network. More importantly, if that data could be moved to free space on the same machine, the recommissioning could take place at disk rates, not network speeds. > # Maybe have a way of decommissioning a single disk on the DN; the files could be moved to space on the other disks or the other machines in the rack. > # There may not be time to use that option, in which case pulling out the disk would be done with no warning, a new disk inserted. > # The DN needs to see that a disk has been replaced (or react to some ops request telling it this), and start using the new disk again -pushing back data, rebuilding the balance. > To complicate the process, assume there is a live TT on the system, running jobs against the data. The TT would probably need to be paused while the work takes place, any ongoing work handled somehow. Halting the TT and then restarting it after the replacement disk went in is probably simplest. > The more disks you add to a node, the more this scenario becomes a need. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira