Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 11 Feb 2015 17:41:11 +0000 (UTC)
From: "Dave Latham (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12765264.1420592664000.15806.1423676471925@Atlassian.JIRA>
In-Reply-To: <JIRA.12765264.1420592664000@Atlassian.JIRA>
References: <JIRA.12765264.1420592664000@Atlassian.JIRA>
 <JIRA.12765264.1420592664237@arcas>
Subject: [jira] [Commented] (HBASE-12814) Zero downtime upgrade from 94 to
 98
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-12814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316612#comment-14316612 ] 

Dave Latham commented on HBASE-12814:
-------------------------------------

Thanks for the suggestion, Andrew.

We weren't originally intending this to become a long term feature, so we put up the patch because we heard there were other people interested who might give us some extra review or may find use for it.  We are OK if it isn't committed since we don't need to maintain it after we make the jump to 0.98 (in general our policy is not to fork anything we have to maintain that we can't push upstream, but short term is ok.)

If there is interest in actually including it in a release in some form, we could do a bit of work to get it there or are fine with seeing it land in a separate module or some other form.

> Zero downtime upgrade from 94 to 98 
> ------------------------------------
>
>                 Key: HBASE-12814
>                 URL: https://issues.apache.org/jira/browse/HBASE-12814
>             Project: HBase
>          Issue Type: New Feature
>    Affects Versions: 0.94.26, 0.98.10
>            Reporter: churro morales
>            Assignee: churro morales
>         Attachments: HBASE-12814-0.94.patch, HBASE-12814-0.98.patch
>
>
> Here at Flurry we want to upgrade our HBase cluster from 94 to 98 while not having any downtime and maintaining master / master replication. 
> Summary:
> Replication is done via thrift RPC between clusters.  It is configurable on a peer by peer basis and the one caveat is that a thrift server starts up on every node which proxies the request to the ReplicationSink.  
> For the upgrade process:
> * in hbase-site.xml two new configuration parameters are added:
> ** *Required*
> *** hbase.replication.sink.enable.thrift -> true
> *** hbase.replication.thrift.server.port -> <thrit_server_port>
> ** *Optional*
> *** hbase.replication.thrift.protection {default: AUTHENTICATION}
> *** hbase.replication.thrift.framed {default: false}
> *** hbase.replication.thrift.compact {default: true}
> - All regionservers can be rolling restarted (no downtime), all clusters must have the respective patch for this to work.
> - the hbase shell add_peer command takes an additional parameter for rpc protocol
> - example: {code} add_peer '1' "hbase-101:2181:/hbase", "THRIFT" {code}
> Now comes the fun part when you want to upgrade your cluster from 94 to 98 you simply pause replication to the cluster being upgraded, do the upgrade and un-pause replication.  Once you have a pair of clusters only replicating inbound and outbound with the 98 release.  You can start replicating via the native rpc protocol by adding the peer again without the _THRIFT_ parameter and subsequently deleting the peer with the thrift protocol.  Because replication is idempotent I don't see any issues as long as you wait for the backlog to drain after un-pausing replication. 
> Special thanks to Francis Liu at Yahoo for laying the groundwork and Mr. Dave Latham for his invaluable knowledge and assistance.  


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)