hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From daemeon reiydelle <daeme...@gmail.com>
Subject Re: Datanode disk configuration
Date Wed, 12 Nov 2014 16:55:39 GMT
I would consider a jbod with 16-64mb stride. This would be a choice where
one or more (e.g. MR) steps will be io bound. Otherwise one or more tasks
will be hit with the low read/write times of having large amounts of data
behind a single spindle
On Nov 12, 2014 8:37 AM, "Brian C. Huffman" <bhuffman@etinternational.com>

>  All,
> I'm setting up a 4-node Hadoop 2.5.1 cluster.  Each node has the following
> drives:
> 1 - 500GB drive (OS disk)
> 1 - 500GB drive
> 1 - 2 TB drive
> 1 - 3 TB drive.
> In past experience I've had lots of issues with non-uniform drive sizes
> for HDFS, but unfortunately it wasn't an option to get all 3TB or 2TB
> drives for this cluster.
> My thought is to set up the 2TB and 3TB drives as HDFS and the 500GB drive
> as intermediate data.  Most our of jobs don't make large use of
> intermediate data, but at least this way, I get a good amount of space
> (2TB) per node before I run into issues.  Then I may end up using the AvailableSpaceVolumeChoosingPolicy
> to help with balancing the blocks.
> If necessary I could put intermediate data on one of the OS partitions
> (/home).  But this doesn't seem ideal.
> Anybody have any recommendations regarding the optimal use of storage in
> this scenario?
> Thanks,
> Brian

View raw message