Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 83097 invoked from network); 11 Dec 2007 20:43:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 11 Dec 2007 20:43:55 -0000 Received: (qmail 78173 invoked by uid 500); 11 Dec 2007 20:43:41 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 78137 invoked by uid 500); 11 Dec 2007 20:43:41 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 78128 invoked by uid 99); 11 Dec 2007 20:43:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Dec 2007 12:43:41 -0800 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [68.142.207.49] (HELO web81815.mail.mud.yahoo.com) (68.142.207.49) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 11 Dec 2007 20:43:42 +0000 Received: (qmail 62147 invoked by uid 60001); 11 Dec 2007 20:43:20 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type:Message-ID; b=VD8Ij180nPDwK6mMFusn3j7BYAQxl2P1kZ0xEEnF4FV9JMGDwFvXGXXJzNi6DTpubBa2TT8nZqjeVs+Fi7z8ozH8C3vrR0S0WYnMahc9YQbqK9/8YDXvA/kHbwkKJzNkMhBZtj+kyObLuTBQJE79fr1kgffEHyW/OsX2D/Rj7Sw=; X-YMail-OSG: .Zmxsd0VM1mNkaIRTxIAIJ29dH1rq.bBkpmq1jKnyWB_2ucf3ORESdSZtfWyg.PS860cdxuX_A-- Received: from [161.221.87.4] by web81815.mail.mud.yahoo.com via HTTP; Tue, 11 Dec 2007 12:43:20 PST X-Mailer: YahooMailRC/818.31 YahooMailWebService/0.7.158.1 Date: Tue, 11 Dec 2007 12:43:20 -0800 (PST) From: Chris Fellows Subject: Re: commodity vs. high perf machines: which would you rather To: hadoop-user@lucene.apache.org MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="0-169910363-1197405800=:61759" Message-ID: <647793.61759.qm@web81815.mail.mud.yahoo.com> X-Virus-Checked: Checked by ClamAV on apache.org --0-169910363-1197405800=:61759 Content-Type: text/plain; charset=us-ascii All of the answers to this thread were critically helpful for management and those trying to understand hadoop and the opportunities. And what kind of hardware we should be looking at. Does this belong in the FAQ? Thanks ----- Original Message ---- From: Ted Dunning To: hadoop-user@lucene.apache.org Sent: Wednesday, November 7, 2007 4:35:58 PM Subject: Re: commodity vs. high perf machines: which would you rather For me, I have three configurations available. A) database class machine with many (>10) fast SAS drives and >10GB memory, dual or quad x quad core cpu's. Let's say that this costs about 20K$. B) generic productiion machine with 2 x 250GB SATA drives, 4-12GB RAM, dual x dual core CPU's (=Dell 1950). Cost is about 2K$. C) POS beige box machine with 2 x SATA drives of variable size, 4 GB RAM, single dual core CPU. Cost is about 1K$. For a $50K budget, I would take 25x(b) over 50x(c) due to simpler and smaller admin issues even though cost/performance would be nominally about the same. I would avoid 2x(a) like the plague. On 11/7/07 11:56 AM, "Chris Fellows" wrote: > Hello, > > Much of the hadoop documentation speaks to large clusters of commodity > machines. There is a debate on our end about which would be better: a small > number of high performance machines (2 boxes with 4 quad core processors) or X > number of commodity machines. I feel that disk I/O might be the bottle neck > with the 2 high perf machines (though I did just read in the FAQ about being > able to split the dfs-data across multiple drives). > > So this is a "which would rather" question. If you were setting up a cluster > of machines to perform data rollups/aggregation (and other mapred tasks) on > files in the .25-1TB size, which would rather have: > > 1. 2 4 quad core machines with your choice on RAM and number of drives > 2. 10 (or more) commodity machines (as defined on the hadoop wiki) > > And of course a "why?" would be very helpful. > > Thanks! > --0-169910363-1197405800=:61759--