Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 21461 invoked from network); 30 Sep 2009 22:47:51 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 30 Sep 2009 22:47:51 -0000 Received: (qmail 96269 invoked by uid 500); 30 Sep 2009 22:47:48 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 96167 invoked by uid 500); 30 Sep 2009 22:47:48 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 96157 invoked by uid 99); 30 Sep 2009 22:47:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Sep 2009 22:47:48 +0000 X-ASF-Spam-Status: No, hits=3.4 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.223.188] (HELO mail-iw0-f188.google.com) (209.85.223.188) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Sep 2009 22:47:36 +0000 Received: by iwn26 with SMTP id 26so3965890iwn.5 for ; Wed, 30 Sep 2009 15:46:15 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.122.162 with SMTP id l34mr727537ibr.20.1254350774540; Wed, 30 Sep 2009 15:46:14 -0700 (PDT) In-Reply-To: References: <25667905.post@talk.nabble.com> <45f85f70909291059t10eb8b61md4c383f3facb183b@mail.gmail.com> <4AC368E4.30406@deri.org> Date: Wed, 30 Sep 2009 22:46:14 +0000 Message-ID: <6e8dca540909301546r7c45f190ifb900219d5017819@mail.gmail.com> Subject: Re: Advice on new Datacenter Hadoop Cluster? From: Kevin Sweeney To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016e64608e4571ae30474d34c6c X-Virus-Checked: Checked by ClamAV on apache.org --0016e64608e4571ae30474d34c6c Content-Type: text/plain; charset=ISO-8859-1 I really appreciate everyone's input. We've been going back and forth on the server size issue here. There are a few reasons we shot for the $1k price, one because we wanted to be able to compare our datacenter costs vs. the cloud costs. Another is that we have spec'd out a fast Intel node with over-the-counter parts. We have a hard time justifying the dual-processor costs and really don't see the need for the big server extras like out-of-band management and redundancy. This is our proposed config, feel free to criticize :) Supermicro 512L-260 Chassis $90 Supermicro X8SIL $160 Heatsink $22 Intel 3460 Xeon $350 Samsung 7200 RPM SATA2 2x$85 2GB Non-ECC DIMM 4x$65 This totals $1052. Doesn't this seem like a reasonable setup? Isn't the purpose of a hadoop cluster to build cheap,fast, replaceable nodes? On Wed, Sep 30, 2009 at 9:06 PM, Ted Dunning wrote: > 2TB drives are just now dropping to parity with 1TB on a $/GB basis. > > If you want space rather than speed, this is a good option. If you want > speed rather than space, more spindles and smaller disks are better. > Ironically, 500GB drives now often cost more than 1TB drives (that is $, > not > $/GB). > > On Wed, Sep 30, 2009 at 7:33 AM, Patrick Angeles > wrote: > > > We went with 2 x Nehalems, 4 x 1TB drives and 24GB RAM. The ram might be > > overkill... but it's DDR3 so you get either 12 or 24GB. Each box has 16 > > virtual cores so 12GB might not have been enough. These boxes are around > > $4k > > each, but can easily outperform any $1K box dollar per dollar (and > > performance per watt). > > > > If you're extremely I/O bound, you can get single-socket configurations > > with > > the same amount of drive spindles for really cheap (~$2k for single proc, > > 8-12GB RAM, 4x1TB drives). > > > > On Wed, Sep 30, 2009 at 10:19 AM, stephen mulcahy > > wrote: > > > > > Todd Lipcon wrote: > > > > > >> Most people building new clusters at this point seem to be leaning > > towards > > >> dual quad core Nehalem with 4x1TB 7200RPM SATA and at least 8G RAM. > > >> > > > > > > We went with a similar configuration for a recently purchased cluster > but > > > opted for qual quad core Opterons (Shanghai) rather than Nehalems and > > > invested the difference in more memory per node (16GB). Nehalem seem to > > > perform very well on some benchmarks but that performance comes at a > > > premium. I guess it depends on your planned use of the cluster but in a > > lot > > > of cases more memory may be better spent, especially if you plan on > > running > > > things like HBase on the cluster also (which we do). > > > > > > -stephen > > > > > > -- > > > Stephen Mulcahy, DI2, Digital Enterprise Research Institute, > > > NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland > > > http://di2.deri.ie http://webstar.deri.ie http://sindice.com > > > > > > > > > -- > Ted Dunning, CTO > DeepDyve > -- Kevin Sweeney Systems Engineer Yieldex -- www.yieldex.com (303) 999-7045 --0016e64608e4571ae30474d34c6c--