Return-Path: Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: (qmail 21894 invoked from network); 1 Mar 2010 07:15:05 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 1 Mar 2010 07:15:05 -0000 Received: (qmail 59470 invoked by uid 500); 28 Feb 2010 22:15:05 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 59428 invoked by uid 500); 28 Feb 2010 22:15:04 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 59420 invoked by uid 99); 28 Feb 2010 22:15:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 Feb 2010 22:15:04 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [209.85.222.195] (HELO mail-pz0-f195.google.com) (209.85.222.195) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 Feb 2010 22:14:58 +0000 Received: by pzk33 with SMTP id 33so1577342pzk.5 for ; Sun, 28 Feb 2010 14:14:37 -0800 (PST) MIME-Version: 1.0 Received: by 10.142.75.21 with SMTP id x21mr2040043wfa.212.1267395277583; Sun, 28 Feb 2010 14:14:37 -0800 (PST) In-Reply-To: <1703587b1002281345k118d980al3ec458cd0d74d343@mail.gmail.com> References: <1703587b1002281345k118d980al3ec458cd0d74d343@mail.gmail.com> Date: Sun, 28 Feb 2010 14:14:37 -0800 Message-ID: Subject: Re: Adding hard-disks to an existing HDFS cluster From: Eli Collins To: general@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Hey Oded, You don't need to format to to add disks to DNs. Just format them and add the directories to dfs.data.dir in the config file, and restart the DN. The data in an individual DN won't be automatically balanced across disks when you restart. Rebalancing is not necessary as the DN will round robin blocks over all disks and stop writing to a disk when it fills. If you want the disks to be balanced you can do that manually by copying the block files from the existing data directories to the new ones--HDFS just checks for the blocks at startup, it doesn't keep track of which directory they were stored in. Thanks, Eli On Sun, Feb 28, 2010 at 1:45 PM, Oded Rosen wrote: > We have an existing HDFS cluster with several datanodes, and we want to add > each of the datanodes another hard-disk, as an addition to the existing > ones. > Is there a way of doing this without formatting the cluster? Our aim is to > save all the data where it is, add and configure the new disks, perform a > balance - with no format whatsoever. > Is it possible? if so, how? > > Any kind of help will be welcomed. > Thanks, > > -- > Oded >