hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Y G <gymi...@gmail.com>
Subject Re: how blocks are replicated
Date Wed, 18 Nov 2009 11:34:58 GMT
for your second question:
from hadoop wiki( http://wiki.apache.org/hadoop/FAQ#A6 ):

Q
If I add new data-nodes to the cluster will HDFS move the blocks to
the newly added nodes in order to balance disk space utilization
between the nodes?

A
No, HDFS will not move blocks to new nodes automatically. However,
newly created files will likely have their blocks placed on the new
nodes.
There are several ways to rebalance the cluster manually.
Select a subset of files that take up a good percentage of your disk
space; copy them to new locations in HDFS; remove the old copies of
the files; rename the new copies to their original names.
A simpler way, with no interruption of service, is to turn up the
replication of files, wait for transfers to stabilize, and then turn
the replication back down.
Yet another way to re-balance blocks is to turn off the data-node,
which is full, wait until its blocks are replicated, and then bring it
back again. The over-replicated blocks will be randomly removed from
different nodes, so you really get them rebalanced not just removed
from the current node.
Finally, you can use the bin/start-balancer.sh command to run a
balancing process to move blocks around the cluster automatically.
-----
天天开心
身体健康

Charles de Gaulle  - "The better I get to know men, the more I find
myself loving dogs."

2009/11/17 Massoud Mazar <Massoud.Mazar@avg.com>
>
> This is probably a basic question:
>
>
>
> Assuming replication is set to 3, when we store a large file in HDFS, is the whole file
stored in 3 nodes (even if you have many more nodes) or it is broken into blocks and each
block is written to 3 nodes? (I assume it is the latter, so when you have 30 nodes available,
each one gets a piece of the file, providing more performance when reading the file).
>
>
>
> My second question is what happens if we add more nodes to an existing cluster? Would
any existing blocks be moved to these new nodes to expand the distribution of the data to
new nodes?
>
>
>
> Thanks
>
> Massoud
>
>

Mime
View raw message