hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Hadoop Java Versions
Date Tue, 28 Jun 2011 09:59:40 GMT
On 28/06/11 04:49, Segel, Mike wrote:
> Hmmm. I could have sworn there was a background balancing bandwidth limiter.

There is, for the rebalancer, node outages are taken more seriously, 
though there have been problems in past 0.20.x where there was a risk of 
a cascade failure on a big switch/rack failure. The risk has been 
reduced, though we all await field reports to confirm this :)

You can get 12-24 TB in a server today, which means the loss of a server 
generates a lot of traffic -which argues for 10 Gbe.

But
  -big increase in switch cost, especially if you (CoI warning) go with 
Cisco
  -there have been problems with things like BIOS PXE and lights out 
management on 10 Gbe -probably due to the NICs being things the BIOS 
wasn't expecting and off the mainboard. This should improve.
  -I don't know how well linux works with ether that fast (field reports 
useful)
  -the big threat is still ToR switch failure, as that will trigger a 
re-replication of every block in the rack.

2x1 Gbe lets you have redundant switches, albeit at the price of more 
wiring, more things to go wrong with the wiring, etc.

The other thing to consider is how well the "enterprise" switches work 
in this world -with a Hadoop cluster you can really test those claims 
how well the switches handle every port lighting up at full rate. 
Indeed, I recommend that as part of your acceptance tests for the switch.



Mime
View raw message