hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian C. Huffman" <bhuff...@etinternational.com>
Subject Datanode disk configuration
Date Wed, 12 Nov 2014 16:36:01 GMT

I'm setting up a 4-node Hadoop 2.5.1 cluster.  Each node has the 
following drives:
1 - 500GB drive (OS disk)
1 - 500GB drive
1 - 2 TB drive
1 - 3 TB drive.

In past experience I've had lots of issues with non-uniform drive sizes 
for HDFS, but unfortunately it wasn't an option to get all 3TB or 2TB 
drives for this cluster.

My thought is to set up the 2TB and 3TB drives as HDFS and the 500GB 
drive as intermediate data.  Most our of jobs don't make large use of 
intermediate data, but at least this way, I get a good amount of space 
(2TB) per node before I run into issues.  Then I may end up using the 
AvailableSpaceVolumeChoosingPolicy to help with balancing the blocks.

If necessary I could put intermediate data on one of the OS partitions 
(/home).  But this doesn't seem ideal.

Anybody have any recommendations regarding the optimal use of storage in 
this scenario?


View raw message