hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian C. Huffman" <bhuff...@etinternational.com>
Subject Datanode disk considerations
Date Wed, 06 Aug 2014 20:45:00 GMT
All,

We currently a Hadoop 2.2.0 cluster with the following characteristics:
- 4 nodes
- Each node is a datanode
- Each node has 3 physical disks for data: 2 x 500GB and 1 x 2TB disk.
- HDFS replication factor of 3

It appears that our 500GB disks are filling up first (the alternative 
would be to put 4 times the number of blocks on the 2TB disks per 
node).  I'm concerned that once the 500GB disks fill, our performance 
will slow down (less spindles being read / written at the same time per 
node).  Is this correct?  Is there anything we can do to change this 
behavior?

Thanks,
Brian



Mime
View raw message