Return-Path: Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: (qmail 88437 invoked from network); 20 Jan 2011 08:42:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 20 Jan 2011 08:42:53 -0000 Received: (qmail 34403 invoked by uid 500); 20 Jan 2011 08:42:52 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 34022 invoked by uid 500); 20 Jan 2011 08:42:48 -0000 Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-user@hadoop.apache.org Delivered-To: mailing list hdfs-user@hadoop.apache.org Received: (qmail 34014 invoked by uid 99); 20 Jan 2011 08:42:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Jan 2011 08:42:47 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ginzman@hotmail.com designates 65.55.90.220 as permitted sender) Received: from [65.55.90.220] (HELO snt0-omc4-s17.snt0.hotmail.com) (65.55.90.220) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Jan 2011 08:42:38 +0000 Received: from SNT135-W33 ([65.55.90.201]) by snt0-omc4-s17.snt0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Thu, 20 Jan 2011 00:42:17 -0800 Message-ID: Content-Type: multipart/alternative; boundary="_f96afa03-9f19-45e1-960b-1a3d70a8e59c_" X-Originating-IP: [82.166.52.154] From: David Ginzburg To: HDFS USER mail list Subject: Adding new data nodes to existing cluster, with different storage capcity Date: Thu, 20 Jan 2011 08:42:17 +0000 Importance: Normal MIME-Version: 1.0 X-OriginalArrivalTime: 20 Jan 2011 08:42:17.0861 (UTC) FILETIME=[F1E01B50:01CBB87D] --_f96afa03-9f19-45e1-960b-1a3d70a8e59c_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi=2C Our current cluster runs with 22 data nodes - each with 4TB . We should be installing new data nodes on this existing cluster =2C but eac= h will have 8TB of storage capacity. I am wondering how will the namenode distribute the blocks=2C It is my unde= rstanding that Replica Placement policy is that data nodes are chosen at ra= ndom=2C so an even distribution is expected =2C So eventually the smaller nodes will fill up while the larger nodes will reach 50% at which point the small nodes will become unusable.=20 Am I correct?=20 Is there any recommended practice in this case? would running a balancer pe= riodically help?=20 =20 = --_f96afa03-9f19-45e1-960b-1a3d70a8e59c_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi=2C
Our current cluster runs with 22 data nodes - each with 4TB .
W= e should be installing new data nodes on this existing cluster =2C but each= will have 8TB of storage capacity.
I am wondering how will the namenode= distribute the blocks=2C It is my understanding that Replica Placement policy is that <= /span>data nodes are chosen at random=2C so an even distr= ibution is expected =2C So eventually the smaller nodes will fill up while the larger nodes will reach 50% at which point the small nodes will become unusable.
Am I correct?
Is there any recommended = practice in this case? would running a balancer periodically help?
&nbs= p=3B





= --_f96afa03-9f19-45e1-960b-1a3d70a8e59c_--