That will make the volume balancing easy, but couldn't it hurt performance? My understanding is that there would be three write threads pointing to the 3TB disk and 2 threads pointing to the 2TB disk.
Would it be better from a performance perspective to include the 500GB drive in the configuration and just use the AvailableSpaceVolumeChoosingPolicy from the beginning?
On 11/12/2014 11:47 AM, Leonid Fedotov wrote:
Create 1 Tb partitions for 2 and 3 TB drives and you will have 5 mount points same size.
On Wed, Nov 12, 2014 at 8:36 AM, Brian C. Huffman <email@example.com> wrote:
I'm setting up a 4-node Hadoop 2.5.1 cluster. Each node has the following drives:
1 - 500GB drive (OS disk)
1 - 500GB drive
1 - 2 TB drive
1 - 3 TB drive.
In past experience I've had lots of issues with non-uniform drive sizes for HDFS, but unfortunately it wasn't an option to get all 3TB or 2TB drives for this cluster.
My thought is to set up the 2TB and 3TB drives as HDFS and the 500GB drive as intermediate data. Most our of jobs don't make large use of intermediate data, but at least this way, I get a good amount of space (2TB) per node before I run into issues. Then I may end up using the AvailableSpaceVolumeChoosingPolicy to help with balancing the blocks.
If necessary I could put intermediate data on one of the OS partitions (/home). But this doesn't seem ideal.
Anybody have any recommendations regarding the optimal use of storage in this scenario?
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.