hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Advice on new Datacenter Hadoop Cluster?
Date Thu, 01 Oct 2009 09:49:30 GMT
Kevin Sweeney wrote:
> I really appreciate everyone's input. We've been going back and forth on the
> server size issue here. There are a few reasons we shot for the $1k price,
> one because we wanted to be able to compare our datacenter costs vs. the
> cloud costs. Another is that we have spec'd out a fast Intel node with
> over-the-counter parts. We have a hard time justifying the dual-processor
> costs and really don't see the need for the big server extras like
> out-of-band management and redundancy. This is our proposed config, feel
> free to criticize :)
> Supermicro 512L-260 Chassis $90
> Supermicro X8SIL                  $160
> Heatsink                                $22
> Intel 3460 Xeon                      $350
> Samsung 7200 RPM SATA2   2x$85
> 2GB Non-ECC DIMM              4x$65
> 
> This totals $1052. Doesn't this seem like a reasonable setup? Isn't the
> purpose of a hadoop cluster to build cheap,fast, replaceable nodes?

Disclaimer 1: I work for a server vendor so may be biased. I will 
attempt to avoid this by not pointing you at HP DL180 or SL170z servers.

Disclaimer 2: I probably don't know what I'm talking about. As far as 
Hadoop concerned, I'm not sure anyone knows what is "the right" 
configuration.

* I'd consider ECC RAM. On a large cluster, over time, errors occur -you 
either notice them or propagate the effects.

* Worry about power, cooling and rack weight.

* Include network costs, power budget. That's your own switch costs, 
plus bandwidth in and out.

* There are some good arguments in favour of fewer, higher end machines 
over many smaller ones.  Less network traffic, often a higher density.

The  cloud hosted vs owned is an interesting question; I suspect the 
spreadsheet there is pretty complex

* Estimate how much data you will want to store over time. On S3, those 
costs ramp up fast; in your own rack you can maybe plan to stick in in 
an extra 2TB HDD a year from now (space, power, cooling and weight 
permitting), paying next year's prices for next year's capacity.

* Virtual machine management costs are different from physical 
management costs, especially if you dont invest time upfront on 
automating your datacentre software provisioning (custom RPMs, PXE 
preboot, kickstart, etc). VMMs you can almost hand manage an image 
(naughty, but possible), as long as you have a single image or two to 
push out. Even then, i'd automate, but at a higher level, creating 
images on demand as load/availablity sees fit.

-Steve



Mime
View raw message