hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Zeroconf for hadoop
Date Tue, 27 Jan 2009 10:48:01 GMT
Edward Capriolo wrote:
> Zeroconf is more focused on simplicity then security. One of the
> original problems that may have been fixes is that any program can
> announce any service. IE my laptop can announce that it is the DNS for
> google.com etc.

-1 to zeroconf as it is way too chatty. Every DNS lookup is mcast, in a 
busy network a lot of CPU time is spent discarding requests. Nor does it 
handle failure that well. It's OK on a home LAN to find a music player, 
but not what you want for a HA infrastructure in the datacentre,

Our LAN discovery tool -Anubis -uses mcast only to do the initial 
discovery, then they have voting and things to select a nominated server 
that everyone just unicasts too at that point; failure of that 
node/network partition triggers a rebinding.

See: http://wiki.smartfrog.org/wiki/display/sf/Anubis  ; the paper 
discusses some of the fun you have, though that paper doesn't also 
include clock drift issue you can encounter when running Xen or 
VMWare-hosted nodes.

> I want to mention a related topic to the list. People are approaching
> the auto-discovery in a number of ways jira. There are a few ways I
> can think of to discover hadoop. A very simple way might be to publish
> the configuration over a web interface. I use a network storage system
> called gluster-fs. Gluster can be configured so the server holds the
> configuration for each client. If the hadoop name node held the entire
> configuration for all the nodes the namenode would only need to be
> aware of the namenode and it could retrieve its configuration from it.
> Having a central configuration management or a discovery system would
> be very useful. HOD is what I think to be the closest thing it is more
> of a top down deployment system.

Allen is a fan of a well managed cluster; he pushes out Hadoop as RPMs 
via PXE and Kickstart and uses LDAP as the central CM tool. I am 
currently exploring bringing up virtual clusters by
  * putting the relevant RPMs out to all nodes; same files/conf for 
every node,
  * having custom configs for Namenode and job tracker; everything else 
becomes a Datanode with a task tracker bound to the masters.
I will start worrying about discovery afterwards, because without the 
ability for the Job Tracker or Namenode to do failover to a fallback Job 
Tracker or Namenode, you don't really need so much in the way of dynamic 
cluster binding.


View raw message