hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <sc...@richrelevance.com>
Subject Re: Dedicated disk for operating system
Date Wed, 10 Aug 2011 17:40:14 GMT


On 8/10/11 6:50 AM, "Allen Wittenauer" <aw@apache.org> wrote:

>
>On Aug 10, 2011, at 2:22 AM, Oded Rosen wrote:
>
>> Hi,
>> What is the best practice regarding disk allocation on hadoop data
>>nodes?
>> We plan on having multiple storage disks per node, and we want to know
>>if we should save a smaller, separate disk for the os (centos).
>> Is it the suggested configuration, or is it ok to let the OS reside on
>>one of the HDFS storage disks?
>
>
>	It's a waste to put the OS disk on a separate disk.  Every spindle =
>performance, esp for MR spills.
>
>	I'm currently configuring:
>
>disk 1 - os, swap, app area, MR spill space, HDFS space
>disk 2 through n - swap, MR spill space, HDFS space

We do something similar, except that disk 1 does not have MR spill space.
Disk 1 is OS, logs, swap, app area, HDFS.  Disk2 is MR spill/temp, HDFS.
Also, we put the HDFS partitions in the 'front' of the disk where
sequential transfers are faster, and the other stuff at the end.

>
>	The usual reason people say to put the OS on a separate space is to make
>upgrades easier as you won't have to touch the application.  The reality
>is that you're going to blow away the entire machine during an upgrade
>anyway.  So don't worry about this situation.
>
>	I know a lot of people combine the MR spill space and HDFS space onto
>the same partition, but I've found that keeping them separate has  two
>advantages:
>
>	* No longer have to deal with the stupid math that HDFS uses for
>reservation--no question as to how much space one actually has
>	* A hard limit on MR space kills badly written jobs before they eat up
>enough space to nuke HDFS

Furthermore, the disk performance is MUCH better if yo split them and
optimize the file system and mount parameters for the different workloads.
M/R spill in the same place as HDFS was causing a lot of random seeks for
us and throttling HDFS performance.
*  HDFS needs mostly sequential write and read optimized file system and
mount parameters (and some random read).  It is also not metadata heavy.
We found that XFS worked very well for this, and has an online
defragmenter we use to keep that partition in good shape.  We are not disk
I/O bound in HDFS with 4 drives/server this way. Ext4 is an option too,
but has no online defragmenter.  Ext3 gets really fragmented after a while
causing the system to get I/O bound more regularly as the node aged.  Ext4
should be much better at avoiding fragmentation than ext3.
*  M/R spill is metadata intensive, with many small reads and writes in
addition to larger writes and reads and files that come and go.  We found
that using ext4 for this, with optimized mount parameters
(rw,noatime,nobarrier,data=writeback,commit=30) tremendously reduced I/O
for M/R temp as many files didn't even live 30 seconds to get flushed to
disk.  These settings are not appropriate for the HDFS partition.  XFS is
a horrible option for M/R spill and temp -- it performs very poorly with
those workloads.

>
>	Of course, the big disadvantage is one needs to calculate the correct
>space needed, and that's a toughie.  But if you know your applications
>then not a problem.  Besides, if one gets it wrong, you can always do a
>rolling re-install to fix it.
>
>	Also note that in this configuration that one cannot take advantage of
>the "keep the machine up at all costs" features in newer Hadoop's, which
>require that root, swap, and the log area be mirrored to be truly
>effective.  I'm not quite convinced that those features are worth it yet
>for anything smaller than maybe a 12 disk config.


Mime
View raw message