Return-Path: X-Original-To: apmail-hadoop-general-archive@minotaur.apache.org Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1DFE183CB for ; Wed, 10 Aug 2011 13:50:34 +0000 (UTC) Received: (qmail 58397 invoked by uid 500); 10 Aug 2011 13:50:32 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 58182 invoked by uid 500); 10 Aug 2011 13:50:31 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 58046 invoked by uid 99); 10 Aug 2011 13:50:30 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Aug 2011 13:50:30 +0000 Received: from localhost (HELO dhcp-02.private.iobm.com) (127.0.0.1) (smtp-auth username aw, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Aug 2011 13:50:30 +0000 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1082) Subject: Re: Dedicated disk for operating system From: Allen Wittenauer In-Reply-To: Date: Wed, 10 Aug 2011 06:50:29 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <044F0CA2-FC84-4720-A3DE-2D45329EC865@apache.org> References: To: X-Mailer: Apple Mail (2.1082) On Aug 10, 2011, at 2:22 AM, Oded Rosen wrote: > Hi, > What is the best practice regarding disk allocation on hadoop data = nodes? > We plan on having multiple storage disks per node, and we want to know = if we should save a smaller, separate disk for the os (centos). > Is it the suggested configuration, or is it ok to let the OS reside on = one of the HDFS storage disks? It's a waste to put the OS disk on a separate disk. Every = spindle =3D performance, esp for MR spills. I'm currently configuring: disk 1 - os, swap, app area, MR spill space, HDFS space disk 2 through n - swap, MR spill space, HDFS space The usual reason people say to put the OS on a separate space is = to make upgrades easier as you won't have to touch the application. The = reality is that you're going to blow away the entire machine during an = upgrade anyway. So don't worry about this situation.=20 I know a lot of people combine the MR spill space and HDFS space = onto the same partition, but I've found that keeping them separate has = two advantages: * No longer have to deal with the stupid math that HDFS uses for = reservation--no question as to how much space one actually has * A hard limit on MR space kills badly written jobs before they = eat up enough space to nuke HDFS Of course, the big disadvantage is one needs to calculate the = correct space needed, and that's a toughie. But if you know your = applications then not a problem. Besides, if one gets it wrong, you can = always do a rolling re-install to fix it. Also note that in this configuration that one cannot take = advantage of the "keep the machine up at all costs" features in newer = Hadoop's, which require that root, swap, and the log area be mirrored to = be truly effective. I'm not quite convinced that those features are = worth it yet for anything smaller than maybe a 12 disk config.=