hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: 1Gb vs 10Gb eth (WAS: Re: Hadoop Java Versions)
Date Fri, 01 Jul 2011 15:43:38 GMT
On 01/07/2011 07:41, Evert Lammerts wrote:
>>> Keeping the amount of disks per node low and the amount of nodes high
>>> should keep the impact of dead nodes in control.
>> It keeps the impact of dead nodes in control but I don't think thats
>> long-term cost efficient.  As prices of 10GbE go down, the "keep the node
>> small" arguement seems less fitting.  And on another note, most servers
>> manufactured in the last 10 years have dual 1GbE network interfaces.  If one
>> were to go by these calcs:
>>> 150 nodes with four 2TB disks each, with HDFS 60% full, it takes around ~32
>> minutes to recover
>> It seems like that assumes a single 1GbE interface, why  not leverage the
>> second?
> I don't know how others setup up their clusters. We have the tradition that every node
in a cluster has at least three interfaces - one for interconnects, one for a management network
(only reachable from within our own network and the primary interface for our admins, accessible
only through a single management node) and one for ILOM, DRAC or whatever lights out manager
is available. This doesn't leave us room for bonding interfaces on off the shelf nodes. Plus
- you'd need twice as many ports in your switch.

Yes, I didn't get into ILO. That can be 100Mbps.
> In the case of Hadoop we're considering adding a fourth NIC for external connectivity.
We don't want people interacting with HDFS from outside while jobs are using the interconnects.
> Of course the choice for 1 or 10Gb ethernet is a function of price. As 10Gb ethernet
prices are approaching that of 1Gb ethernet it gets more attractive. The recovery times scale
linearly with ethernet speed, so 1Gb ethernet compared to 2Gb bonded ethernet or 10Gb ethernet
makes quite a difference. I'm just saying that since we have other variables to tweak - amount
of disks and number of nodes - we can limit the impact of minimizing recovery times.

2Gb bonded with 2x ToR removes a single ToR switch as an SPOF for the 
rack but increases install/debugging costs. More wires, and your network 
configuration topology has got worse.

> Another thing to consider is that as 10Gb ethernet gets cheaper, it gets more attractive
to stop using HDFS (or at least, data locality) and start using an external storage cluster.
Compute node failure then has no impact, disk failure is hardly noticed by compute nodes.
But this is really still very far from as cheap as many small nodes with relatively little
disks - I really like the bang for buck you get with Hadoop :-)

Yeah, but you are a computing facility whose backbone gets data off the 
LHC tier 1 sites abroad faster than the seek time of the disks in your 
neighbouring building...

View raw message