hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: HBase High Availability
Date Wed, 25 Nov 2009 09:55:20 GMT
With multiple masters, the election is mediated by zookeeper and the
idle masters are awaiting the relection cycle.

The problems with brining regions up after a failure isnt the actual
speed of loading them, but bugs with the master.  This is being fixed
in 0.21. It will allow us to much more rapidly bring regions back
online after a failure.

As for loading a region across multiple servers, this would have to be
thought about quite carefully to see if it is possible. Right now
there is a substantial amount of state loaded that would be changed by
other servers, and you would still have to reload that state anyways.

We also need to ask ourselves, what does "availability" mean anyways?
For example, if a regionserver failed, does that mean hbase is
offline? The answer would have to be a "no", but certain sections of
data might be offline temporarily. Thus HBase has 100% uptime by this
definition, correct?

In the annals of distributed computing, you are only protected with
minimal downtime from limited hardware failures.  Once you take out
too many nodes, things start failing, that is a given. HBase solves
the data scalability problem, it solves the limited machine failure
problem.

I highly suggest this presentation:
http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf

BTW, what is your budget for "near 100% uptime" anyways?  How many
datacenters did you plan on using?

On Wed, Nov 25, 2009 at 1:31 AM, Murali Krishna. P
<muralikpbhat@yahoo.com> wrote:
> Hi,
>    This is regarding the region unavailability when a region server goes down. There
will be cases where we have thousands of regions per RS and it takes considerable amount of
time to redistribute the regions when a node fails. The service will be unavailable during
that period. I am evaluating HBase for an application where we need to guarantee close to
100% availability (namenode is still SPOF, leave that).
>
>    One simple idea would be to replicate the regions in memory. Can we load the same
region in multiple region servers? I am not sure about the feasibility yet, there will be
issues like consistency across these in memory replicas. Wanted to know whether there were
any thoughts / work already going on this area? I saw some related discussion here http://osdir.com/ml/hbase-user-hadoop-apache/2009-09/msg00118.html,
not sure what is the state.
>
>  Same needs to be done with the master as well or is it already done with ZK? How fast
is the master re-election and catalog load currently ? Do we always have multiple masters
in ready to run state?
>
>
>  Thanks,
> Murali Krishna
>

Mime
View raw message