hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oded Rosen <o...@legolas-media.com>
Subject RE: Dedicated disk for operating system
Date Wed, 10 Aug 2011 14:25:16 GMT
This is helpful

-----Original Message-----
From: Allen Wittenauer [mailto:aw@apache.org] 
Sent: Wednesday, August 10, 2011 4:50 PM
To: general@hadoop.apache.org
Subject: Re: Dedicated disk for operating system

On Aug 10, 2011, at 2:22 AM, Oded Rosen wrote:

> Hi,
> What is the best practice regarding disk allocation on hadoop data nodes?
> We plan on having multiple storage disks per node, and we want to know if we should save
a smaller, separate disk for the os (centos).
> Is it the suggested configuration, or is it ok to let the OS reside on one of the HDFS
storage disks?

	It's a waste to put the OS disk on a separate disk.  Every spindle = performance, esp for
MR spills.

	I'm currently configuring:

disk 1 - os, swap, app area, MR spill space, HDFS space disk 2 through n - swap, MR spill
space, HDFS space

	The usual reason people say to put the OS on a separate space is to make upgrades easier
as you won't have to touch the application.  The reality is that you're going to blow away
the entire machine during an upgrade anyway.  So don't worry about this situation. 

	I know a lot of people combine the MR spill space and HDFS space onto the same partition,
but I've found that keeping them separate has  two advantages:

	* No longer have to deal with the stupid math that HDFS uses for reservation--no question
as to how much space one actually has
	* A hard limit on MR space kills badly written jobs before they eat up enough space to nuke

	Of course, the big disadvantage is one needs to calculate the correct space needed, and that's
a toughie.  But if you know your applications then not a problem.  Besides, if one gets it
wrong, you can always do a rolling re-install to fix it.

	Also note that in this configuration that one cannot take advantage of the "keep the machine
up at all costs" features in newer Hadoop's, which require that root, swap, and the log area
be mirrored to be truly effective.  I'm not quite convinced that those features are worth
it yet for anything smaller than maybe a 12 disk config.

View raw message