hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Dynamic Cluster Node addition
Date Fri, 01 Jul 2011 05:58:29 GMT

You can inspect the data used by your new nodes after the balancer
operation runs. "hadoop dfsadmin -report" should tell you detailed
stats about each of the DNs, or look at /fsck

(Note: by default, the balancer operation may be bandwidth limited,
for performance reasons and may take a while to happen -- although
this is configurable)

On Fri, Jul 1, 2011 at 10:42 AM, Paul Rimba <paul.rimba@gmail.com> wrote:
> Hey Matei,
> what if you do the bin/hadoop-daemon.sh start tasktracker
> bin/hadoop-daemon.sh start datanode.
> Does it move the old data to the new slave?
> I run that scenario a couple of times and run the start-balancer.sh. It
> always says that the cluster is balanced. Does it mean that the has been
> spread out?
> Thanks
> Paul
> On Fri, Jul 1, 2011 at 2:05 PM, Matei Zaharia <matei@eecs.berkeley.edu>
> wrote:
>> You can have a new TaskTracker or DataNode join the cluster by just
>> starting that daemon on the slave (e.g. bin/hadoop-daemon.sh start
>> tasktracker) and making sure it is configured to connect to the right
>> JobTracker or NameNode (through the mapred.job.tracker and fs.default.name
>> properties in the config files). The slaves file is only used for the
>> bin/start-* and bin/stop-* scripts, but Hadoop doesn't look at it at
>> runtime. There may be other similar files that it can look at though, such
>> as a blacklist, but I think that in the default configuration you can just
>> launch the daemon and it will work.
>> Note that if you add a new DataNode, Hadoop won't automatically move old
>> data to it (to spread out the across the cluster) unless you run the HDFS
>> rebalancer, at least as far as I know.
>> Matei
>> On Jun 30, 2011, at 8:56 PM, Paul Rimba wrote:
>> Hey there,
>> i am trying to add a new datanode/tasktracker to a currently running
>> cluster.
>> Is this feasible? And if yes, how do i change the masters, slaves and
>> dfs.replication(in hdfs-site.xml) configuration?
>> can i add the new slave to the slaves configuration file while the cluster
>> is running?
>> i found thisĀ ./bin/hadoop dfs -setrep -w 4 /path/to/file command to change
>> the dfs.replication on the fly.
>> Is there a better way to do it?
>> Thank you for your kind attention.
>> Kind Regards,
>> Paul

Harsh J

View raw message