hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Angeles <patrickange...@gmail.com>
Subject Re: hadoop hardware configuration
Date Thu, 28 May 2009 19:21:22 GMT
On Thu, May 28, 2009 at 6:02 AM, Steve Loughran <stevel@apache.org> wrote:

> That really depends on the work you are doing...the bytes in/out to CPU
> work, and the size of any memory structures that are built up over the run.
> With 1 core per physical disk, you get the bandwidth of a single disk per
> CPU; for some IO-intensive work you can make the case for two disks/CPU -one
> in, one out, but then you are using more power, and if/when you want to add
> more storage, you have to pull out the disks to stick in new ones. If you go
> for more CPUs, you will probably need more RAM to go with it.

Just to throw a wrench in the works, Intel's Nehalem architecture takes DDR3
memory which are paired in 3's. So for a dual quad core rig, you can get
either 6 x 2GB (12GB) or, 6 x 4GB (24GB) for an extra $500. That's a big
step up in price for extra memory in a slave node. 12GB probably won't be
enough, because the mid-range Nehalems support hyper-threading, so you
actually get up to 16 threads running on a dual quad setup.

> Then there is the question of where your electricity comes from, what the
> limits for the room are, whether you are billed on power drawn or quoted PSU
> draw, what the HVAC limits are, what the maximum allowed weight per rack is,
> etc, etc.

We're going to start with cabinets in a co-location. Most can provide 40amps
per cabinet (with up to 80% load), so you could fit around 30 single-socket
servers, or 15 dual-socket servers in a single rack.

> I'm a fan of low Joule work, though we don't have any benchmarks yet of the
> power efficiency of different clusters; the number of MJ used to do a a
> terasort. I'm debating doing some single-cpu tests for this on my laptop, as
> the battery knows how much gets used up by some work.
>    4. In planning storage capacity, how much spare disk space should I take
>>   into account for 'scratch'? For now, I'm assuming 1x the input data
>> size.
> That you should probably be able to determine on experimental work on
> smaller datasets. Some maps can throw out a lot of data, most reduces do
> actually reduce the final amount.
> -Steve
> (Disclaimer: I'm not making any official recommendations for hardware here,
> just making my opinions known. If you do want an official recommendation
> from HP, talk to your reseller or account manager, someone will look at your
> problem in more detail and make some suggestions. If you have any code/data
> that could be shared for benchmarking, that would help validate those
> suggestions)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message