hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chandrashekhar Kotekar <shekhar.kote...@gmail.com>
Subject Re: Adding datanodes to Hadoop cluster - Will data redistribute?
Date Sat, 07 Feb 2015 04:12:30 GMT
First confirm if new nodes are added into cluster or not. You can use
"hadoop dfsadmin -report" command to check per node hdfs usage.
If new nodes are listed in this command then you can run hadoop balancer to
manually redistribute some of the data.

Regards,
Chandrashekhar
On 07-Feb-2015 4:24 AM, "Manoj Venkatesh" <manovenki@gmail.com> wrote:

> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation
> and 2 additional nodes were added later to increase disk and CPU capacity.
> What i see is that processing is shared amongst all the nodes whereas the
> storage is reaching capacity on the original 6 nodes whereas the newly
> added machines have relatively large amount of storage still unoccupied.
>
> I was wondering if there is an automated or any way of redistributing data
> so that all the nodes are equally utilized. I have checked for the
> configuration parameter - *dfs.datanode.fsdataset.volume.choosing.policy*
> have options 'Round Robin' or 'Available Space', are there any other
> configurations which need to be reviewed.
>
> Thanks,
> Manoj
>

Mime
View raw message