Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 67278 invoked from network); 11 Sep 2009 03:07:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 Sep 2009 03:07:30 -0000 Received: (qmail 56069 invoked by uid 500); 11 Sep 2009 03:07:28 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 55902 invoked by uid 500); 11 Sep 2009 03:07:27 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 55892 invoked by uid 99); 11 Sep 2009 03:07:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Sep 2009 03:07:27 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of david.ritch@gmail.com designates 209.85.221.193 as permitted sender) Received: from [209.85.221.193] (HELO mail-qy0-f193.google.com) (209.85.221.193) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Sep 2009 03:07:17 +0000 Received: by qyk31 with SMTP id 31so675283qyk.29 for ; Thu, 10 Sep 2009 20:06:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:references:in-reply-to :x-enigmail-version:content-type:content-transfer-encoding; bh=QRw1eQ2Ejj+F67tBoMXnWB1R6vRq1hu58fxTjGjmZb4=; b=cdk6VBDtVCLVKlXU8HxeRWNLSrkUGOTg8vqaKnUm4w+TjECjIln64AS8s1USxZBHEz 1jruB2XYhv8nsH1eekXk12NLSBNQi9FUOo3LOQped0QXXx2JtInMxGcIVeZN+X+j7+6k +xpgeGYlDZpudmXaB+6YWxywY8EMxIs0WdWLs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; b=w8OvpkMBMc4DVyE+F9TTfFjCTu/E+XP0UEKHdIo7FFdTeAZa6onsOQpzoYLkLTjs1j Y/HGwFxOVsWN0fTptnlyohl6y2eXj3u52IMIfHMpt+Gcu1mCHpn520vgP7H0pWTIM0SB 71FROzuJzDH05fFxhuXifTnyVaAw+f7feTY3g= Received: by 10.224.98.134 with SMTP id q6mr2086900qan.247.1252638416999; Thu, 10 Sep 2009 20:06:56 -0700 (PDT) Received: from ?192.168.2.101? ([96.244.50.213]) by mx.google.com with ESMTPS id 2sm909359qwi.46.2009.09.10.20.06.55 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 10 Sep 2009 20:06:56 -0700 (PDT) Message-ID: <4AA9BEC9.3000602@gmail.com> Date: Thu, 10 Sep 2009 23:06:49 -0400 From: "David B. Ritch" User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.1) Gecko/20090715 Thunderbird/3.0b3 MIME-Version: 1.0 To: common-user@hadoop.apache.org Subject: Re: Decommissioning Individual Disks References: <4AA9A839.2050600@gmail.com> <623d9cf40909101839x2ed67754r8c58033573f278af@mail.gmail.com> <35a22e220909101907x96abbb8h430f58d11a7d5ea2@mail.gmail.com> In-Reply-To: <35a22e220909101907x96abbb8h430f58d11a7d5ea2@mail.gmail.com> X-Enigmail-Version: 0.96a Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Thank you both. That's what we did today. It seems fairly reasonable when a node has a few disks, say 3-5. However, at some sites, with larger nodes, it seems more awkward. When a node has a dozen or more disks (as used in the larger terasort benchmarks), migrating the data off all the disks is likely to be more of an issue. I hope that there is a better solution to this before my client moves to much larger nodes! ;-) dbr On 9/10/2009 10:07 PM, Amandeep Khurana wrote: > I think decommissioning the node and replacing the disk is a cleaner > approach. That's what I'd recommend doing as well.. > > On 9/10/09, Alex Loddengaard wrote: > >> Hi David, >> Unfortunately there's really no way to do what you're hoping to do in an >> automatic way. You can move the block files (including their .meta files) >> from one disk to another. Do this when the datanode daemon is stopped. >> Then, when you start the datanode daemon, it will scan dfs.data.dir and be >> totally happy if blocks have moved hard drives. I've never tried to do this >> myself, but others on the list have suggested this technique for "balancing >> disks." >> >> You could also change your process around a little. It's not too crazy to >> decommission an entire node, replace one of its disks, then bring it back >> into the cluster. Seems to me that this is a much saner approach: your ops >> team will tell you which disk needs replacing. You decommission the node, >> they replace the disk, you add the node back to the pool. Your call I >> guess, though. >> >> Hope this was helpful. >> >> Alex >> >> On Thu, Sep 10, 2009 at 6:30 PM, David B. Ritch >> wrote: >> >> >>> What do you do with the data on a failing disk when you replace it? >>> >>> Our support person comes in occasionally, and often replaces several >>> disks when he does. These are disks that have not yet failed, but >>> firmware indicates that failure is imminent. We need to be able to >>> migrate our data off these disks before replacing them. If we were >>> replacing entire servers, we would decommission them - but we have 3 >>> data disks per server. If we were replacing one disk at a time, we >>> wouldn't worry about it (because of redundancy). We can decommission >>> the servers, but moving all the data off of all their disks is a waste. >>> >>> What's the best way to handle this? >>> >>> Thanks! >>> >>> David >>> >>> >> > >