hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2357) Add read-only region replicas (slaves) for availability and fast region recovery
Date Sat, 27 Mar 2010 22:13:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850588#action_12850588

Andrew Purtell commented on HBASE-2357:

Writes would be blocked by the slowest of the clique but if this scheme is allowing (strongly
consistent!) read load to be more spread out, then in theory anyway the probability of hot
accesses to a particular region server starving the write side is lowered accordingly. We
could mock it and see what happens and/or try to work through some of the particulars formally.
Like Ryan I wonder how slow updates might get. Consider if we run ZAB on a 3-node clique and
hflush in parallel to commit with a barrier on completion of both. Who wins the race? How
often would hflush take longer? Could be a substantial percentage, especially in a mixed HBase
and HDFS (plain mapreduce or Hive or Pig or Cascading or...) loaded environment. It's not
clear that hflush would not dominate, is my point.

What I don't like about log shipping is the read replicas are not going to be useful to someone
who is using HBase for its strong consistency and needs it, with exception for use cases where
one could accept consistent results looking back from the timestamp of the last replication.
(But that timestamp could be different on each slave, so master and slaves might all have
different views!) But with a consensus protocol, read load can be spread as is the intent
of this issue and yet the data is still strongly consistent. 

So I might humbly suggest that both ideas have pros and cons and neither warrants a -1 nor
a +1 at this point, IMO. 

> Add read-only region replicas (slaves) for availability and fast region recovery
> --------------------------------------------------------------------------------
>                 Key: HBASE-2357
>                 URL: https://issues.apache.org/jira/browse/HBASE-2357
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: master, regionserver
>            Reporter: Todd Lipcon
> I dont plan on working on this in the short term, but the idea is to extend region ownership
to have two modes. Each region has one primary region server and N slave region servers. The
slaves would follow the master (probably by streaming the relevant HLog entries directly from
it) and be able to serve stale reads. The benefit is twofold: (a) provides the ability to
spread read load, (b) enables very fast region failover/rebalance since the memstore is already
nearly up to date on the slave RS.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message