hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Fellows <chrisc_fell...@yahoo.com>
Subject commodity vs. high perf machines: which would you rather
Date Wed, 07 Nov 2007 19:56:54 GMT

Much of the hadoop documentation speaks to large clusters of commodity machines. There is
a debate on our end about which would be better: a small number of high performance machines
(2 boxes with 4 quad core processors) or X number of commodity machines. I feel that disk
I/O might be the bottle neck with the 2 high perf machines (though I did just read in the
FAQ about being able to split the dfs-data across multiple drives).

So this is a "which would rather" question. If you were setting up a cluster of machines to
perform data rollups/aggregation (and other mapred tasks) on files in the .25-1TB size, which
would rather have:

1. 2 4 quad core machines with your choice on RAM and number of drives
2. 10 (or more) commodity machines (as defined on the hadoop wiki)

And of course a "why?" would be very helpful.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message