hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chad Walters <Chad.Walt...@microsoft.com>
Subject Re: [jira] Commented: (HBASE-1295) Federated HBase
Date Sat, 25 Apr 2009 17:12:12 GMT

It might be better to respond to JIRA tickets in the ticket. That way the conversation about
the issues in the ticket is kept in one place in case the issue stalls out and is picked up
months later.


On 4/24/09 8:22 PM, "Ryan Rawson" <ryanobjc@gmail.com> wrote:

We could use ZooKeeper paths as a way for replication endpoints to know
about each other.

On Fri, Apr 24, 2009 at 8:16 PM, Andrew Purtell (JIRA) <jira@apache.org>wrote:

>    [
> https://issues.apache.org/jira/browse/HBASE-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702647#action_12702647]
> Andrew Purtell commented on HBASE-1295:
> ---------------------------------------
> bq. In the case a cluster is down, I guess that would mean the other
> clusters would have to keep all the WALs until the it is up again. At that
> moment, it may receive tons of WALs right?
> Yes the effect of a partition and extended outage is a buildup of WALs on
> the peer clusters, and then a lot of backlog. Let me think about this case
> and post a revised slide deck.
> bq. Also I was wondering, if you want to add a new cluster, would the way
> to go be replicating by hand (MR or else) all the data to the other cluster
> then telling somehow that the clusters have a new peer?
> I was anticipating that the cluster would be advertised as a peer, somehow,
> and then replication would then start. The replicators should add tables and
> column families to their local schema on demand as the cells are received,
> maybe additionally also ask the peer about column family details as
> necessary. Whether or not to bring over existing data would be a
> deployment/application concern I think and could be handed by a MR
> export-import job.
> > Federated HBase
> > ---------------
> >
> >                 Key: HBASE-1295
> >                 URL: https://issues.apache.org/jira/browse/HBASE-1295
> >             Project: Hadoop HBase
> >          Issue Type: New Feature
> >            Reporter: Andrew Purtell
> >         Attachments: hbase_repl.1.pdf
> >
> >
> > HBase should consider supporting a federated deployment where someone
> might have terascale (or beyond) clusters in more than one geography and
> would want the system to handle replication between the clusters/regions. It
> would be sweet if HBase had something on the roadmap to sync between
> replicas out of the box.
> > Consider if rows, columns, or even cells could be scoped: local, or
> global.
> > Then, consider a background task on each cluster that replicates new
> globally scoped edits to peer clusters. The HBase/Bigtable data model has
> convenient features (timestamps, multiversioning) such that simple exchange
> of globally scoped cells would be conflict free and would "just work".
> Implementation effort here would be in producing an efficient mechanism for
> collecting up edits from all the HRS and transmitting the edits over the
> network to peers where they would then be split out to the HRS there.
> Holding on to the edit trace and tracking it until the remote commits
> succeed would also be necessary. So, HLog is probably the right place to set
> up the tee. This would be filtered log shipping, basically.
> > This proposal does not consider transactional tables. For transactional
> tables, enforcement of global mutation commit ordering would come into the
> picture if the user  wants the  transaction to span the federation. This
> should be an optional feature even with transactional tables themselves
> being optional because of how slow it would be.
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message