hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Kellerman <...@powerset.com>
Subject Re: HMaster SPOF?
Date Mon, 30 Apr 2007 06:23:56 GMT
You are absolutely correct that the HMaster is currently a single point
of failure, just as the death of the name node in a HDFS cluster has
been. Work has been done on HDFS to create the back up name node, and
eliminating the HMaster as a SPOF will be a focus in the future (first
we have to get it, the HRegionServer and the client to work).

The thing that makes a hot HMaster or HDFS back up name node difficult
is the lack of a distributed lock manager (like Google's Chubby). A
distributed lock manager project has been proposed on the Hadoop Wiki
(see http://wiki.apache.org/lucene-hadoop/DistributedLockServer) for the
project outline.

To date, the focus has been getting HBase functional in a distributed
environment at all (right now it runs only in a single process - see
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture for the
latest update on the HBase project), and noone has volunteered to take
on the distributed lock manager project. If someone would like to step
up and start driving the lock manager project, that would benefit both
Hadoop's and HBase's failover capabilities.


On Sun, 2007-04-29 at 20:02 -0700, Otis Gospodnetic wrote:
> Hi,
> I've read http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture and it sounds
mostly wonderful!  However, I am wondering about this: "Since the death of the HMaster means
the death of the entire system, there's no reason to store this information on disk.".  Are
there plans to change this, so that HMaster is no longer a SPOF?
> Thanks,
> Otis
>  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share
Jim Kellerman, Senior Engineer; Powerset

View raw message