hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Fellows <chrisc_fell...@yahoo.com>
Subject Re: commodity vs. high perf machines: which would you rather
Date Tue, 11 Dec 2007 20:43:20 GMT
All of the answers to this thread were critically helpful for management and those trying to
understand hadoop and the opportunities. And what kind of hardware we should be looking at.

Does this belong in the FAQ?


----- Original Message ----
From: Ted Dunning <tdunning@veoh.com>
To: hadoop-user@lucene.apache.org
Sent: Wednesday, November 7, 2007 4:35:58 PM
Subject: Re: commodity vs. high perf machines: which would you rather

For me, I have three configurations available.

A) database class machine with many (>10) fast SAS drives and >10GB
dual or quad x quad core cpu's.  Let's say that this costs about 20K$.

B) generic productiion machine with 2 x 250GB SATA drives, 4-12GB RAM,
x dual core CPU's (=Dell 1950).  Cost is about 2K$.

C) POS beige box machine with 2 x SATA drives of variable size, 4 GB
single dual core CPU.  Cost is about 1K$.

For a $50K budget, I would take 25x(b) over 50x(c) due to simpler and
smaller admin issues even though cost/performance would be nominally
the same.  I would avoid 2x(a) like the plague.

On 11/7/07 11:56 AM, "Chris Fellows" <chrisc_fellows@yahoo.com> wrote:

> Hello,
> Much of the hadoop documentation speaks to large clusters of
> machines. There is a debate on our end about which would be better: a
> number of high performance machines (2 boxes with 4 quad core
 processors) or X
> number of commodity machines. I feel that disk I/O might be the
 bottle neck
> with the 2 high perf machines (though I did just read in the FAQ
 about being
> able to split the dfs-data across multiple drives).
> So this is a "which would rather" question. If you were setting up a
> of machines to perform data rollups/aggregation (and other mapred
 tasks) on
> files in the .25-1TB size, which would rather have:
> 1. 2 4 quad core machines with your choice on RAM and number of
> 2. 10 (or more) commodity machines (as defined on the hadoop wiki)
> And of course a "why?" would be very helpful.
> Thanks!

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message