hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Chern <idry...@gmail.com>
Subject Re: Datanode disk considerations
Date Wed, 06 Aug 2014 20:51:40 GMT
Run the “hadoop balencer” command on the namenode. It’s is used for balancing skewed
data.
http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer


On Aug 6, 2014, at 1:45 PM, Brian C. Huffman <bhuffman@etinternational.com> wrote:

> All,
> 
> We currently a Hadoop 2.2.0 cluster with the following characteristics:
> - 4 nodes
> - Each node is a datanode
> - Each node has 3 physical disks for data: 2 x 500GB and 1 x 2TB disk.
> - HDFS replication factor of 3
> 
> It appears that our 500GB disks are filling up first (the alternative would be to put
4 times the number of blocks on the 2TB disks per node).  I'm concerned that once the 500GB
disks fill, our performance will slow down (less spindles being read / written at the same
time per node).  Is this correct?  Is there anything we can do to change this behavior?
> 
> Thanks,
> Brian
> 
> 


Mime
View raw message