hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12814) Zero downtime upgrade from 94 to 98
Date Wed, 11 Feb 2015 17:51:15 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316628#comment-14316628

Andrew Purtell commented on HBASE-12814:

Thanks [~davelatham]. Let's see if anyone else chimes in. If not we can leave this be I suppose.
If there's more interest then I think [~lhofhansl] and any others interested in 0.94 will
need to decide if the patch for that version is committable. Depending on the outcome of that
discussion it could make sense to commit this in the form I suggested, or whatever ends up
being the consensus approach. 

In my opinion having a Thrift protocol option for replication is very interesting for the
reason you built it: it enables fully online upgrades via a switch over to a slave or DR buddy
no matter what else is going on. It would be good to have this around for whenever users must
ride over a major version bump. If we have it as a configurable option then I'm not concerned
about other implications, like performance, internal complexity of the regionserver, etc.

> Zero downtime upgrade from 94 to 98 
> ------------------------------------
>                 Key: HBASE-12814
>                 URL: https://issues.apache.org/jira/browse/HBASE-12814
>             Project: HBase
>          Issue Type: New Feature
>    Affects Versions: 0.94.26, 0.98.10
>            Reporter: churro morales
>            Assignee: churro morales
>         Attachments: HBASE-12814-0.94.patch, HBASE-12814-0.98.patch
> Here at Flurry we want to upgrade our HBase cluster from 94 to 98 while not having any
downtime and maintaining master / master replication. 
> Summary:
> Replication is done via thrift RPC between clusters.  It is configurable on a peer by
peer basis and the one caveat is that a thrift server starts up on every node which proxies
the request to the ReplicationSink.  
> For the upgrade process:
> * in hbase-site.xml two new configuration parameters are added:
> ** *Required*
> *** hbase.replication.sink.enable.thrift -> true
> *** hbase.replication.thrift.server.port -> <thrit_server_port>
> ** *Optional*
> *** hbase.replication.thrift.protection {default: AUTHENTICATION}
> *** hbase.replication.thrift.framed {default: false}
> *** hbase.replication.thrift.compact {default: true}
> - All regionservers can be rolling restarted (no downtime), all clusters must have the
respective patch for this to work.
> - the hbase shell add_peer command takes an additional parameter for rpc protocol
> - example: {code} add_peer '1' "hbase-101:2181:/hbase", "THRIFT" {code}
> Now comes the fun part when you want to upgrade your cluster from 94 to 98 you simply
pause replication to the cluster being upgraded, do the upgrade and un-pause replication.
 Once you have a pair of clusters only replicating inbound and outbound with the 98 release.
 You can start replicating via the native rpc protocol by adding the peer again without the
_THRIFT_ parameter and subsequently deleting the peer with the thrift protocol.  Because replication
is idempotent I don't see any issues as long as you wait for the backlog to drain after un-pausing
> Special thanks to Francis Liu at Yahoo for laying the groundwork and Mr. Dave Latham
for his invaluable knowledge and assistance.  

This message was sent by Atlassian JIRA

View raw message