hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: hadoop hardware configuration
Date Thu, 28 May 2009 10:02:15 GMT
Patrick Angeles wrote:
> Sorry for cross-posting, I realized I sent the following to the hbase list
> when it's really more a Hadoop question.

This is an interesting question. Obviously as an HP employee you must 
assume that I'm biased when I say HP DL160 servers are good  value for 
the workers, though our blade systems are very good for a high physical 
density -provided you have the power to fill up the rack.

> 2 x Hadoop Master (and Secondary NameNode)
>    - 2 x 2.3Ghz Quad Core (Low Power Opteron -- 2376 HE @ 55W)
>    - 16GB DDR2-800 Registered ECC Memory
>    - 4 x 1TB 7200rpm SATA II Drives
>    - Hardware RAID controller
>    - Redundant Power Supply
>    - Approx. 390W power draw (1.9amps 208V)
>    - Approx. $4000 per unit

I do not know the what the advantages of that many cores are on a NN. 
Someone needs to do some experiments. I do know you need enough RAM to 
hold the index in memory, and you may want to go for a bigger block size 
to keep the index size down.

> 6 x Hadoop Task Nodes
>    - 1 x 2.3Ghz Quad Core (Opteron 1356)
>    - 8GB DDR2-800 Registered ECC Memory
>    - 4 x 1TB 7200rpm SATA II Drives
>    - No RAID (JBOD)
>    - Non-Redundant Power Supply
>    - Approx. 210W power draw (1.0amps 208V)
>    - Approx. $2000 per unit
> I had some specific questions regarding this configuration...

>    1. Is hardware RAID necessary for the master node?

You need a good story to ensure that loss of a disk on the master 
doesn't lose the filesystem. I like RAID there, but the alternative is 
to push the stuff out over the network to other storage you trust. That 
could be NFS-mounted RAID storage, it could be NFS mounted JBOD. 
Whatever your chosen design, test it works before you go live by running 
the cluster then simulate different failures, see how well the 
hardware/ops team handles it.

Keep an eye on where that data goes, because when the NN runs out of 
file storage, the consequences can be pretty dramatic (i,e the cluster 
doesnt come up unless you edit the editlog by hand)

>    2. What is a good processor-to-storage ratio for a task node with 4TB of
>    raw storage? (The config above has 1 core per 1TB of raw storage.)

That really depends on the work you are doing...the bytes in/out to CPU 
work, and the size of any memory structures that are built up over the run.

With 1 core per physical disk, you get the bandwidth of a single disk 
per CPU; for some IO-intensive work you can make the case for two 
disks/CPU -one in, one out, but then you are using more power, and 
if/when you want to add more storage, you have to pull out the disks to 
stick in new ones. If you go for more CPUs, you will probably need more 
RAM to go with it.

>    3. Am I better off using dual quads for a task node, with a higher power
>    draw? Dual quad task node with 16GB RAM and 4TB storage costs roughly $3200,
>    but draws almost 2x as much power. The tradeoffs are:
>       1. I will get more CPU per dollar and per watt.
>       2. I will only be able to fit 1/2 as much dual quad machines into a
>       rack.
>       3. I will get 1/2 the storage capacity per watt.
>       4. I will get less I/O throughput overall (less spindles per core)

First there is the algorithm itself, and whether you are IO or CPU 
bound. Most MR jobs that I've encountered are fairly IO bound -without 
indexes, every lookup has to stream through all the data, so it's power 
inefficient and IO limited. but if you are trying to do higher level 
stuff than just lookup, then you will be doing more CPU-work

Then there is the question of where your electricity comes from, what 
the limits for the room are, whether you are billed on power drawn or 
quoted PSU draw, what the HVAC limits are, what the maximum allowed 
weight per rack is, etc, etc.

I'm a fan of low Joule work, though we don't have any benchmarks yet of 
the power efficiency of different clusters; the number of MJ used to do 
a a terasort. I'm debating doing some single-cpu tests for this on my 
laptop, as the battery knows how much gets used up by some work.

>    4. In planning storage capacity, how much spare disk space should I take
>    into account for 'scratch'? For now, I'm assuming 1x the input data size.

That you should probably be able to determine on experimental work on 
smaller datasets. Some maps can throw out a lot of data, most reduces do 
actually reduce the final amount.


(Disclaimer: I'm not making any official recommendations for hardware 
here, just making my opinions known. If you do want an official 
recommendation from HP, talk to your reseller or account manager, 
someone will look at your problem in more detail and make some 
suggestions. If you have any code/data that could be shared for 
benchmarking, that would help validate those suggestions)

View raw message