Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4578E17B1D for ; Sun, 8 Feb 2015 10:29:33 +0000 (UTC) Received: (qmail 84311 invoked by uid 500); 8 Feb 2015 10:29:28 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 84211 invoked by uid 500); 8 Feb 2015 10:29:28 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 99591 invoked by uid 99); 7 Feb 2015 04:14:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Feb 2015 04:14:28 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of shekhar.kotekar@gmail.com designates 209.85.217.181 as permitted sender) Received: from [209.85.217.181] (HELO mail-lb0-f181.google.com) (209.85.217.181) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Feb 2015 04:14:02 +0000 Received: by mail-lb0-f181.google.com with SMTP id u14so14975479lbd.12 for ; Fri, 06 Feb 2015 20:12:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=LCGTvItgx2D+bpTqB134y/1znRg08CXLtaYyNhvIE4s=; b=XmXtrUOagiV65BXVHvTvfT4reAUOZq3cuG0hTy2KyLm9T87iXpe/VxtsN3ReKR9v4b f6U4tFbbuLrePyENXWZvIgBWWXT7+6rNizhjWo26QurmSg4IJCM5ZY1LH7Dq8eWEMlAK vvergmzt7lAoasnIysDV4deUp2LfzknsO0wCUG4n/+inrpI5AemwS8QagvdC2r77ItKu bTeL+VVztA3tDb0kDSIqm7EnbDm0vwSrDkg/Vgm+dxKSMnlYjL7ENN6/bqyBE1TNfmrP Z4MRrPdrYM1eQQOoman+uGOgboKCat8YXNEOKlUIOpGI9/IIiEpjx6uD4DXLYj4CeQtA waRQ== MIME-Version: 1.0 X-Received: by 10.112.83.104 with SMTP id p8mr5793177lby.70.1423282350803; Fri, 06 Feb 2015 20:12:30 -0800 (PST) Received: by 10.112.3.7 with HTTP; Fri, 6 Feb 2015 20:12:30 -0800 (PST) Received: by 10.112.3.7 with HTTP; Fri, 6 Feb 2015 20:12:30 -0800 (PST) In-Reply-To: References: Date: Sat, 7 Feb 2015 09:42:30 +0530 Message-ID: Subject: Re: Adding datanodes to Hadoop cluster - Will data redistribute? From: Chandrashekhar Kotekar To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11361700ef4f59050e77be42 X-Virus-Checked: Checked by ClamAV on apache.org --001a11361700ef4f59050e77be42 Content-Type: text/plain; charset=UTF-8 First confirm if new nodes are added into cluster or not. You can use "hadoop dfsadmin -report" command to check per node hdfs usage. If new nodes are listed in this command then you can run hadoop balancer to manually redistribute some of the data. Regards, Chandrashekhar On 07-Feb-2015 4:24 AM, "Manoj Venkatesh" wrote: > Dear Hadoop experts, > > I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation > and 2 additional nodes were added later to increase disk and CPU capacity. > What i see is that processing is shared amongst all the nodes whereas the > storage is reaching capacity on the original 6 nodes whereas the newly > added machines have relatively large amount of storage still unoccupied. > > I was wondering if there is an automated or any way of redistributing data > so that all the nodes are equally utilized. I have checked for the > configuration parameter - *dfs.datanode.fsdataset.volume.choosing.policy* > have options 'Round Robin' or 'Available Space', are there any other > configurations which need to be reviewed. > > Thanks, > Manoj > --001a11361700ef4f59050e77be42 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

First confirm if new nodes are added into cluster or not. Yo= u can use "hadoop dfsadmin -report" command to check per node hdf= s usage.
If new nodes are listed in this command then you can run hadoop balancer to= manually redistribute some of the data.

Regards,
Chandrashekhar

On 07-Feb-2015 4:24 AM, "Manoj Venkatesh&qu= ot; <manovenki@gmail.com> = wrote:
Dear Hadoop experts,

I have a Hadoop cluste= r of 8 nodes, 6 were added during cluster creation and 2 additional nodes w= ere added later to increase disk and CPU capacity. What i see is that proce= ssing is shared amongst all the nodes whereas the storage is reaching capac= ity on the original 6 nodes whereas the newly added machines have relativel= y large amount of storage still unoccupied.

I was wondering if ther= e is an automated or any way of redistributing data so that all the nodes a= re equally utilized. I have checked for the configuration parameter - df= s.datanode.fsdataset.volume.choosing.policy have options 'Round Rob= in' or 'Available Space', are there any other configurations wh= ich need to be reviewed.

Thanks,
Manoj
--001a11361700ef4f59050e77be42--