hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Considering a completely different administrative approach for Replication v2
Date Tue, 08 Sep 2015 15:59:49 GMT
That's not my motivation for bringing this up. I suppose if you favor a
similarity then it's a happy accident.


On Mon, Sep 7, 2015 at 11:48 PM, Jerry He <jerryjch@gmail.com> wrote:

> Interesting idea.
> Is this meant to provide an alternative 'native' cross DC replication
> support that is close to Cassandra ?
>
> Jerry
> On Sep 7, 2015 10:44 PM, "Andrew Purtell" <apurtell@apache.org> wrote:
>
> > I opened an umbrella for Replication v2 as HBASE-14379. At the moment it
> > envisions the administration of cross-DC replication relationships and
> data
> > access as the same as today. However, we do have an opportunity to reboot
> > with a completely different approach. I thought it worth bringing up for
> > discussion.
> >
> > We could in theory reboot around timeline consistent region replicas. If
> > you squint, region replicas have a similar theory of operation as
> cross-DC
> > replication. What if we redefine administration and data access for
> > Replication v2 as sets of region replica placements that can cross data
> > center boundaries, with the client able to distinguish local locations
> from
> > remote locations, and then choose based on policy? So if, for example,
> you
> > may have three data centers, then instead of setting up three
> > point-to-point replication peering relationships like today, you'd simply
> > create a table that has a region replica placement policy in its schema
> > with (logical) locations spanning all three data centers. Behind the
> > scenes, each data center would have HBASE-10070 style primary-secondary
> > relationships, and additionally:
> > 1. the primary region will run something like today's replication source
> > for each secondary location that is in a remote DC;
> > 2. a primary region anywhere may receive change streams from remote DCs
> > like today's replication sinks.
> >
> > On the client side we have some prior work in this regard: CSBT, and Ted
> > Malaska's HBase.MCC. I mention CSBT but I don't think we want its
> > partitioning or reliance on Zookeeper. HBase.MCC is more of a starting
> > point.
> >
> > I'm not saying we should do this, only that we could do this. There are
> > pros and cons. In some ways defining point-to-point replication
> > relationships is easier for admins and users, e.g. the topology is built
> > and managed explicitly. In some ways merging replicas and cross-DC
> > replication is easier, e.g. it removes APIs, necessary tooling, cognitive
> > load (cross-DC replication is no longer 'special').
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message