hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: [PROPOSAL] HBASE-10070 branch
Date Wed, 15 Jan 2014 21:03:29 GMT
On Wed, Jan 15, 2014 at 12:51 PM, Devaraj Das <ddas@hortonworks.com> wrote:

> Some responses inline. Thanks for the inputs.
> On Wed, Jan 15, 2014 at 11:17 AM, Stack <stack@duboce.net> wrote:
> > On Wed, Jan 15, 2014 at 12:44 AM, Enis Söztutar <enis@hortonworks.com
> >wrote:
> >
> >> Hi,
> >>
> >> I just wanted to give some updates on the HBASE-10070 efforts from the
> >> technical side, and development side, and propose a branch.
> >>
> >> From the technical side:
> >> The changes for region replicas phase 1 are becoming more mature and
> >> stable, and most of the "base" changes are starting to become good
> >> candidates for review. The code has been rebased to trunk, and the main
> >> working repo has been moved to the HBASE-10070 branch at
> >> https://github.com/enis/hbase/tree/hbase-10070.
> >>
> >> An overview of the changes that is working include:
> >>  - HRegionInfo & MetaReader & MetaEditor changes for support region
> >> replicas
> >>  - HTableDescriptor changes and shell changes for supporting
> >>  - WebUI changes to display whether a region is a replica or not
> >>  - AssignmentManager changes coupled with RegionStates & Master changes
> to
> >> create and assign replicas, alter table, enable table, etc support.
> >>
> >
> >
> > Thanks for the writeup.
> >
> > I am late to the game so take my comments w/ a grain of salt -- I'll
> take a
> > look at HBASE-10070 -- but high-level do we have to go the read replicas
> > route?  IMO, having our current already-strained AssignmentManager code
> > base manage three replicas instead of one will ensure that Jimmy Xiang
> and
> > Jeffrey Zhong do nothing else for the next year or two but work on the
> new
> > interesting use cases introduced by this new level of complexity put
> upon a
> > system that has just achieved a hard-won stability.
> >
> Stack, the model is that the replicas (HRegionInfo with an added field
> 'replicaId') are treated just as any other region in the AM. You can
> see the code - it's not adding much at all in terms of new code to
> handle replicas.
I'm getting there.  Will check it out.

> > A few of us chatting offline -- Jimmy, Jon, Elliott, and I -- were
> > wondering if you couldn't solve this read replicas in a more hbase
> 'native'
> > way* by just bringing up three tables -- a main table and then two
> snapshot
> > clones with the clones refreshed on a period (via snapshot or via
> > in-cluster replication) --  and then a shim on top of an HBase client
> would
> > read from the main table until failure and then from a snapshot until the
> > main came back.  Reads from snapshot tables could be marked 'stale'.
>  You'd
> > have to modify the balancer so the tables -- or at least their regions --
> > were physically distinct... you might be able just have the three tables
> > each in a different namespace.
> >
> At a high level, considering all the work that would be needed in the
> client (for it to be able to be aware of the primary and the snapshot
> regions)

Minor.  Right?  Snapshot tables would have a _snapshot suffix?

> and in the master (to do with managing the placements of the
> regions),

Balancer already factors myriad attributes.  Adding one more rule seems
like it would be near-in scope.

And this would be work not in the client but in layer above the client.

> I am not convinced. Also, consider that you will be taking a
> lot of snapshots and adding to the filesystem's load for the file
> creations.
Snapshotting is a well-worn and tested code path.  Making them is pretty
lightweight op.  Frequency would depend on what the app needs.

Could go the replication route too, another well-worn and  tested code path.

Trying to minimize the new code getting to the objective.

> > Or how much more work would it take to follow the route our Facebook
> > brothers and sisters have taken doing quorum reads and writes incluster?
> >
> If you talking about Facebook's work that is talked about in
> HBASE-7509, the quorum reads is something that we will benefit from,
> and that will help the filesystem side of the story, but we still need
> multiple (redundant) regions for the hbase side. If a region is not
> reachable, the client could go to another replica for the region...

Quorum read/writes as in paxos, raft (Liyin talked about the Facebook
Hydrabase project at his keynote at hbasecon last year).

> > * When I say 'native' way in the above, what I mean by this is that HBase
> > has always been about giving clients a 'consistent' view -- at least when
> > the query is to the source cluster.  Introducing talk and APIs that talk
> of
> > 'eventual consistency' muddies our story.
> >
> >
> As we have discussed in the jira, there are use cases. And it's
> optional - all the APIs provide 'consistency' by default (status quo).
Sorry I'm behind.  Let me review.  My concern is that our shell and API now
will have notions of consistency other than "what you write is what you
read" all over them because we took on a use case that is 'interesting' but
up to this at least, a rare request.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message