hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michel Segel <michael_se...@hotmail.com>
Subject Re: Hadoop Java Versions
Date Tue, 28 Jun 2011 12:27:13 GMT
You're preaching to the choir. :-)

With Sandybridge, you're going to start seeing 10 GBe on the motherboard.
We built our clusters using 1U boxes where you're stuck w 4 3.5" drives. With larger chassis,
You can fit an additional controller card and more drives.

More drives reduces the bottleneck and means  your performance will  be throttled by your
network and the amount of memory.

I priced out a couple of vendors and when you build out your boxes, the magic number per data
node is $10,000.00 USD. (budget this amount per data node.) Moore's Law doesn't drop the price,
but it gets you more bang for your buck. Note that this magic number is pre-discount and YMMV.
[This also gets in to what is meant by commodity hardware.]

I agree that 10GBe is a necessity and I have been looking at it for the past 2 years, only
to be shot down by my client's IT group. I agree that Cisco's ToR switches are expensive,
however there are Arista and Blade networks switches that claim to be Cisco friendly and aren't
too pricey. Somewhere around 10K a box. (Again YMMV).

If you want to upgrade existing boxes, you will probably want to look at Solarflare cards.


Sent from a remote device. Please excuse any typos...

Mike Segel

On Jun 28, 2011, at 4:59 AM, Steve Loughran <stevel@apache.org> wrote:

> On 28/06/11 04:49, Segel, Mike wrote:
>> Hmmm. I could have sworn there was a background balancing bandwidth limiter.
> There is, for the rebalancer, node outages are taken more seriously, though there have
been problems in past 0.20.x where there was a risk of a cascade failure on a big switch/rack
failure. The risk has been reduced, though we all await field reports to confirm this :)
> You can get 12-24 TB in a server today, which means the loss of a server generates a
lot of traffic -which argues for 10 Gbe.
> But
> -big increase in switch cost, especially if you (CoI warning) go with Cisco
> -there have been problems with things like BIOS PXE and lights out management on 10 Gbe
-probably due to the NICs being things the BIOS wasn't expecting and off the mainboard. This
should improve.
> -I don't know how well linux works with ether that fast (field reports useful)
> -the big threat is still ToR switch failure, as that will trigger a re-replication of
every block in the rack.
> 2x1 Gbe lets you have redundant switches, albeit at the price of more wiring, more things
to go wrong with the wiring, etc.
> The other thing to consider is how well the "enterprise" switches work in this world
-with a Hadoop cluster you can really test those claims how well the switches handle every
port lighting up at full rate. Indeed, I recommend that as part of your acceptance tests for
the switch.

View raw message