hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: commodity vs. high perf machines: which would you rather
Date Wed, 07 Nov 2007 21:35:58 GMT

For me, I have three configurations available.

A) database class machine with many (>10) fast SAS drives and >10GB memory,
dual or quad x quad core cpu's.  Let's say that this costs about 20K$.

B) generic productiion machine with 2 x 250GB SATA drives, 4-12GB RAM, dual
x dual core CPU's (=Dell 1950).  Cost is about 2K$.

C) POS beige box machine with 2 x SATA drives of variable size, 4 GB RAM,
single dual core CPU.  Cost is about 1K$.

For a $50K budget, I would take 25x(b) over 50x(c) due to simpler and
smaller admin issues even though cost/performance would be nominally about
the same.  I would avoid 2x(a) like the plague.

On 11/7/07 11:56 AM, "Chris Fellows" <chrisc_fellows@yahoo.com> wrote:

> Hello,
> Much of the hadoop documentation speaks to large clusters of commodity
> machines. There is a debate on our end about which would be better: a small
> number of high performance machines (2 boxes with 4 quad core processors) or X
> number of commodity machines. I feel that disk I/O might be the bottle neck
> with the 2 high perf machines (though I did just read in the FAQ about being
> able to split the dfs-data across multiple drives).
> So this is a "which would rather" question. If you were setting up a cluster
> of machines to perform data rollups/aggregation (and other mapred tasks) on
> files in the .25-1TB size, which would rather have:
> 1. 2 4 quad core machines with your choice on RAM and number of drives
> 2. 10 (or more) commodity machines (as defined on the hadoop wiki)
> And of course a "why?" would be very helpful.
> Thanks!

View raw message