hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@maprtech.com>
Subject Re: 1Gb vs 10Gb eth (WAS: Re: Hadoop Java Versions)
Date Fri, 01 Jul 2011 17:14:25 GMT
You are twice the ports, but if you know that the management port is not
used for serious data, then you can put a SOHO grade switch on those ports
at negligible cost.

There is a serious conflict of goals here if you have software that can make
serious use more than one NIC.  On the one hand, it is nice to use the
hardware you have.  On the other, it is really nice to have guaranteed
bandwidth for management.

With traditional switch level link aggregation this is usually not a problem
since each flow is committed to one NIC or the other, resulting in poor
balancing.  The silver lining of the poor balancing is that there is always
plenty of bandwidth for administration function.

Though it may be slightly controversial to mention, the way that MapRdoes
application level NIC bonding could cause some backlash from ops staff
because it can actually saturate as many NIC's as you make available.
 Generally, management doesn't require much bandwidth and doing things like
reloading the BIOS is usually done when a machine is out of service for
maintenance, but the potential for surprise is there.

I have definitely seen a number of conditions where service access is a
complete god-send.  With geographically dispersed data centers, it is an
absolute requirement because you just can't staff every data center with
enough hands-on admins within a few minutes travel time of the data center.
 ILO (or DRAC as Dell calls it) gives you 5 minute response time to fixing
totally horked machines.

On Fri, Jul 1, 2011 at 8:47 AM, Steve Loughran <stevel@apache.org> wrote:

> On 01/07/2011 08:16, Ryan Rawson wrote:
>
>> What's the justification for a management interface? Doesn't that increase
>> complexity? Also you still twice the ports?
>>
>
> ILO reduces ops complexity. you can push things like BIOS updates out, boot
> machines into known states, instead of the slowly diverging world that RPM
> or debian updates get you into (the final state depends on the order, and
> different machines end up applying them in a different order), and for
> diagnosing problems when even the root disk doesn't want to come out and
> play.
>
> In a big cluster you need to worry about things like not powering on a
> quadrant of the site simultaneously as boot-time can be a peak power surge;
> you may want to bring up slices of every rack gradually, upgrade the BIOS
> and OS, and gradually ramp things up. This is where HPC admin tools differ
> from classic "lock down an end user windows PC" tooling.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message