Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of shekhar.kotekar@gmail.com
 designates 209.85.217.181 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CADx2dmyo1YmEC4Zx5iCJwEMM4i518h2eF=n86q1-c8+x0Juw_Q@mail.gmail.com>
References: 
 <CADx2dmyo1YmEC4Zx5iCJwEMM4i518h2eF=n86q1-c8+x0Juw_Q@mail.gmail.com>
Date: Sat, 7 Feb 2015 09:42:30 +0530
Message-ID: 
 <CAJwfZdque27fb=1vo5JEmneUPKk6omr_Cea5ixgf-Hp-K-Tt3g@mail.gmail.com>
Subject: Re: Adding datanodes to Hadoop cluster - Will data redistribute?
From: Chandrashekhar Kotekar <shekhar.kotekar@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=001a11361700ef4f59050e77be42

--001a11361700ef4f59050e77be42
Content-Type: text/plain; charset=UTF-8

First confirm if new nodes are added into cluster or not. You can use
"hadoop dfsadmin -report" command to check per node hdfs usage.
If new nodes are listed in this command then you can run hadoop balancer to
manually redistribute some of the data.

Regards,
Chandrashekhar
On 07-Feb-2015 4:24 AM, "Manoj Venkatesh" <manovenki@gmail.com> wrote:

> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation
> and 2 additional nodes were added later to increase disk and CPU capacity.
> What i see is that processing is shared amongst all the nodes whereas the
> storage is reaching capacity on the original 6 nodes whereas the newly
> added machines have relatively large amount of storage still unoccupied.
>
> I was wondering if there is an automated or any way of redistributing data
> so that all the nodes are equally utilized. I have checked for the
> configuration parameter - *dfs.datanode.fsdataset.volume.choosing.policy*
> have options 'Round Robin' or 'Available Space', are there any other
> configurations which need to be reviewed.
>
> Thanks,
> Manoj
>

--001a11361700ef4f59050e77be42
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<p dir=3D"ltr">First confirm if new nodes are added into cluster or not. Yo=
u can use &quot;hadoop dfsadmin -report&quot; command to check per node hdf=
s usage.<br>
If new nodes are listed in this command then you can run hadoop balancer to=
 manually redistribute some of the data.</p>
<p dir=3D"ltr">Regards,<br>
Chandrashekhar</p>
<div class=3D"gmail_quote">On 07-Feb-2015 4:24 AM, &quot;Manoj Venkatesh&qu=
ot; &lt;<a href=3D"mailto:manovenki@gmail.com">manovenki@gmail.com</a>&gt; =
wrote:<br type=3D"attribution"><blockquote class=3D"gmail_quote" style=3D"m=
argin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"l=
tr"><div><div><div>Dear Hadoop experts,<br><br></div>I have a Hadoop cluste=
r of 8 nodes, 6 were added during cluster creation and 2 additional nodes w=
ere added later to increase disk and CPU capacity. What i see is that proce=
ssing is shared amongst all the nodes whereas the storage is reaching capac=
ity on the original 6 nodes whereas the newly added machines have relativel=
y large amount of storage still unoccupied. <br><br>I was wondering if ther=
e is an automated or any way of redistributing data so that all the nodes a=
re equally utilized. I have checked for the configuration parameter - <b>df=
s.datanode.fsdataset.volume.choosing.policy</b> have options &#39;Round Rob=
in&#39; or &#39;Available Space&#39;, are there any other configurations wh=
ich need to be reviewed.<br><br></div>Thanks,<br></div>Manoj<br></div>
</blockquote></div>

--001a11361700ef4f59050e77be42--