hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Angadi <rang...@yahoo-inc.com>
Subject Re: commodity vs. high perf machines: which would you rather
Date Wed, 07 Nov 2007 20:17:57 GMT

Does commodity hardware come with ECC memory? Since Hadoop apps tend to 
move large amounts of data around, ECC memory seems pretty important.

With just two machines, you might be limited since you need to run 
multiple components (Namenode, job tracker, etc) on one machine.
Two machines seems quite low... I would go with 10, if cost is same. 
Also 10 machines might eat up more power.

Raghu.

Chris Fellows wrote:
> Hello,
> 
> Much of the hadoop documentation speaks to large clusters of commodity machines. There
is a debate on our end about which would be better: a small number of high performance machines
(2 boxes with 4 quad core processors) or X number of commodity machines. I feel that disk
I/O might be the bottle neck with the 2 high perf machines (though I did just read in the
FAQ about being able to split the dfs-data across multiple drives).
> 
> So this is a "which would rather" question. If you were setting up a cluster of machines
to perform data rollups/aggregation (and other mapred tasks) on files in the .25-1TB size,
which would rather have:
> 
> 1. 2 4 quad core machines with your choice on RAM and number of drives
> 2. 10 (or more) commodity machines (as defined on the hadoop wiki)
> 
> And of course a "why?" would be very helpful.
> 
> Thanks!
> 
> 


Mime
View raw message