hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Hsieh <...@cloudera.com>
Subject Re: [Shadow Regions / Read Replicas ]
Date Tue, 03 Dec 2013 19:51:39 GMT
To keep the discussion focused on the design goals, I'm going start
referring to enis and deveraj's eventually consistent read replicas as the
*read replica* design, and consistent fast read recovery mechanism based on
shadowing/tailing the wals as *shadow regions* or *shadow memstores*.  Can
we agree on nomenclature?


On Tue, Dec 3, 2013 at 11:07 AM, Enis Söztutar <enis@apache.org> wrote:

> Thanks Jon for bringing this to dev@.
>
>
> On Mon, Dec 2, 2013 at 10:01 PM, Jonathan Hsieh <jon@cloudera.com> wrote:
>
> > Fundamentally, I'd prefer focusing on making HBase "HBasier" instead of
> > tackling a feature that other systems architecturally can do better
> > (inconsistent reads).   I consider consistent reads/writes being one of
> > HBase's defining features. That said, I think read replicas makes sense
> and
> > is a nice feature to have.
> >
>
> Our design proposal has a specific use case goal, and hopefully we can
> demonstrate the
> benefits of having this in HBase so that even more pieces can be built on
> top of this. Plus I imagine this will
> be a widely used feature for read-only tables or bulk loaded tables. We are
> not
> proposing of reworking strong consistency semantics or major architectural
> changes. I think by
> having the tables to be defined with replication count, and the proposed
> client API changes (Consistency definition)
> plugs well into the HBase model rather well.
>
>
I do agree think that without any recent updating mechanism, we are
limiting this usefulness of this feature to essentially *only* the
read-only or bulk load only tables.  Recency if there were any
edits/updates would be severely lagging (by default potentially an hour)
especially in cases where there are only a few edits to a primarily bulk
loaded table.  This limitation is not mentioned in the tradeoffs or
requirements (or a non-requirements section) definitely should be listed
there.

With the current design it might be best to have a flag on the table which
marks it read-only or bulk-load only so that it only gets used by users
when the table is in that mode?  (and maybe an "escape hatch" for power
users).

[snip]
>
> - I think the two goals are both worthy on their own each with their own
> > optimal points.  We should in the design makes sure that we can support
> > both goals.
> >
>
> I think our proposal is consistent with your doc, and we have considered
> secondary region promotion
> in the future section. It would be good if you can review and comment on
> whether you see any points
> missing.
>
>
> I definitely will. At the moment, I think the hybrid for the wals/hlogs I
suggested in the other thread seems to be an optimal solution considering
locality.  Though feasible is obviously more complex than just one approach
alone.


> > - I want to making sure the proposed design have a path for optimal
> > fast-consistent read-recovery.
> >
>
> We think that it is, but it is a secondary goal for the initial work. I
> don't see any reason why secondary
> promotion cannot be build on top of this, once the branch is in a better
> state.
>

Based on the detail in the design doc and this statement it sounds like you
have a prototype branch already?  Is this the case?

-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message