hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arpit Agarwal <aagar...@hortonworks.com>
Subject Re: Adding datanodes to Hadoop cluster - Will data redistribute?
Date Fri, 06 Feb 2015 23:07:42 GMT
Hi Manoj,

Existing data is not automatically redistributed when you add new DataNodes. Take a look at
the 'hdfs balancer' command which can be run as a separate administrative tool to rebalance
data distribution across DataNodes.

From: Manoj Venkatesh <manovenki@gmail.com<mailto:manovenki@gmail.com>>
Reply-To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Date: Friday, February 6, 2015 at 11:34 AM
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Subject: Adding datanodes to Hadoop cluster - Will data redistribute?

Dear Hadoop experts,

I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation and 2 additional
nodes were added later to increase disk and CPU capacity. What i see is that processing is
shared amongst all the nodes whereas the storage is reaching capacity on the original 6 nodes
whereas the newly added machines have relatively large amount of storage still unoccupied.

I was wondering if there is an automated or any way of redistributing data so that all the
nodes are equally utilized. I have checked for the configuration parameter - dfs.datanode.fsdataset.volume.choosing.policy
have options 'Round Robin' or 'Available Space', are there any other configurations which
need to be reviewed.


View raw message