hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manoj Venkatesh <manoj.venkat...@xoom.com>
Subject Re: Adding datanodes to Hadoop cluster - Will data redistribute?
Date Mon, 09 Feb 2015 18:02:13 GMT
Thank you all for answering, the hdfs balancer worked. Now the datanodes capacity is more or
less equally balanced.


From: Arpit Agarwal <aagarwal@hortonworks.com<mailto:aagarwal@hortonworks.com>>
Reply-To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Date: Friday, February 6, 2015 at 3:07 PM
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Subject: Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Hi Manoj,

Existing data is not automatically redistributed when you add new DataNodes. Take a look at
the 'hdfs balancer' command which can be run as a separate administrative tool to rebalance
data distribution across DataNodes.

From: Manoj Venkatesh <manovenki@gmail.com<mailto:manovenki@gmail.com>>
Reply-To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Date: Friday, February 6, 2015 at 11:34 AM
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Subject: Adding datanodes to Hadoop cluster - Will data redistribute?

Dear Hadoop experts,

I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation and 2 additional
nodes were added later to increase disk and CPU capacity. What i see is that processing is
shared amongst all the nodes whereas the storage is reaching capacity on the original 6 nodes
whereas the newly added machines have relatively large amount of storage still unoccupied.

I was wondering if there is an automated or any way of redistributing data so that all the
nodes are equally utilized. I have checked for the configuration parameter - dfs.datanode.fsdataset.volume.choosing.policy
have options 'Round Robin' or 'Available Space', are there any other configurations which
need to be reviewed.


The information transmitted in this email is intended only for the person or entity to which
it is addressed, and may contain material confidential to Xoom Corporation, and/or its subsidiary,
buyindiaonline.com Inc. Any review, retransmission, dissemination or other use of, or taking
of any action in reliance upon, this information by persons or entities other than the intended
recipient(s) is prohibited. If you received this email in error, please contact the sender
and delete the material from your files.

View raw message