Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 11 Feb 2015 17:33:13 +0000 (UTC)
From: "Andrew Purtell (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12765264.1420592664000.15662.1423675993586@Atlassian.JIRA>
In-Reply-To: <JIRA.12765264.1420592664000@Atlassian.JIRA>
References: <JIRA.12765264.1420592664000@Atlassian.JIRA>
 <JIRA.12765264.1420592664237@arcas>
Subject: [jira] [Commented] (HBASE-12814) Zero downtime upgrade from 94 to
 98
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-12814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316600#comment-14316600 ] 

Andrew Purtell commented on HBASE-12814:
----------------------------------------

What do people think about making this a pluggable replication endpoint implementation option in its own Maven module? I think that would be a short path to commit since it side-steps a lot of the issues raised in my previous comment.

> Zero downtime upgrade from 94 to 98 
> ------------------------------------
>
>                 Key: HBASE-12814
>                 URL: https://issues.apache.org/jira/browse/HBASE-12814
>             Project: HBase
>          Issue Type: New Feature
>    Affects Versions: 0.94.26, 0.98.10
>            Reporter: churro morales
>            Assignee: churro morales
>         Attachments: HBASE-12814-0.94.patch, HBASE-12814-0.98.patch
>
>
> Here at Flurry we want to upgrade our HBase cluster from 94 to 98 while not having any downtime and maintaining master / master replication. 
> Summary:
> Replication is done via thrift RPC between clusters.  It is configurable on a peer by peer basis and the one caveat is that a thrift server starts up on every node which proxies the request to the ReplicationSink.  
> For the upgrade process:
> * in hbase-site.xml two new configuration parameters are added:
> ** *Required*
> *** hbase.replication.sink.enable.thrift -> true
> *** hbase.replication.thrift.server.port -> <thrit_server_port>
> ** *Optional*
> *** hbase.replication.thrift.protection {default: AUTHENTICATION}
> *** hbase.replication.thrift.framed {default: false}
> *** hbase.replication.thrift.compact {default: true}
> - All regionservers can be rolling restarted (no downtime), all clusters must have the respective patch for this to work.
> - the hbase shell add_peer command takes an additional parameter for rpc protocol
> - example: {code} add_peer '1' "hbase-101:2181:/hbase", "THRIFT" {code}
> Now comes the fun part when you want to upgrade your cluster from 94 to 98 you simply pause replication to the cluster being upgraded, do the upgrade and un-pause replication.  Once you have a pair of clusters only replicating inbound and outbound with the 98 release.  You can start replicating via the native rpc protocol by adding the peer again without the _THRIFT_ parameter and subsequently deleting the peer with the thrift protocol.  Because replication is idempotent I don't see any issues as long as you wait for the backlog to drain after un-pausing replication. 
> Special thanks to Francis Liu at Yahoo for laying the groundwork and Mr. Dave Latham for his invaluable knowledge and assistance.  


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)