Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B6A0917A38 for ; Wed, 11 Feb 2015 17:33:29 +0000 (UTC) Received: (qmail 41200 invoked by uid 500); 11 Feb 2015 17:33:13 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 41153 invoked by uid 500); 11 Feb 2015 17:33:13 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 41140 invoked by uid 99); 11 Feb 2015 17:33:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Feb 2015 17:33:13 +0000 Date: Wed, 11 Feb 2015 17:33:13 +0000 (UTC) From: "Andrew Purtell (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-12814) Zero downtime upgrade from 94 to 98 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-12814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316600#comment-14316600 ] Andrew Purtell commented on HBASE-12814: ---------------------------------------- What do people think about making this a pluggable replication endpoint implementation option in its own Maven module? I think that would be a short path to commit since it side-steps a lot of the issues raised in my previous comment. > Zero downtime upgrade from 94 to 98 > ------------------------------------ > > Key: HBASE-12814 > URL: https://issues.apache.org/jira/browse/HBASE-12814 > Project: HBase > Issue Type: New Feature > Affects Versions: 0.94.26, 0.98.10 > Reporter: churro morales > Assignee: churro morales > Attachments: HBASE-12814-0.94.patch, HBASE-12814-0.98.patch > > > Here at Flurry we want to upgrade our HBase cluster from 94 to 98 while not having any downtime and maintaining master / master replication. > Summary: > Replication is done via thrift RPC between clusters. It is configurable on a peer by peer basis and the one caveat is that a thrift server starts up on every node which proxies the request to the ReplicationSink. > For the upgrade process: > * in hbase-site.xml two new configuration parameters are added: > ** *Required* > *** hbase.replication.sink.enable.thrift -> true > *** hbase.replication.thrift.server.port -> > ** *Optional* > *** hbase.replication.thrift.protection {default: AUTHENTICATION} > *** hbase.replication.thrift.framed {default: false} > *** hbase.replication.thrift.compact {default: true} > - All regionservers can be rolling restarted (no downtime), all clusters must have the respective patch for this to work. > - the hbase shell add_peer command takes an additional parameter for rpc protocol > - example: {code} add_peer '1' "hbase-101:2181:/hbase", "THRIFT" {code} > Now comes the fun part when you want to upgrade your cluster from 94 to 98 you simply pause replication to the cluster being upgraded, do the upgrade and un-pause replication. Once you have a pair of clusters only replicating inbound and outbound with the 98 release. You can start replicating via the native rpc protocol by adding the peer again without the _THRIFT_ parameter and subsequently deleting the peer with the thrift protocol. Because replication is idempotent I don't see any issues as long as you wait for the backlog to drain after un-pausing replication. > Special thanks to Francis Liu at Yahoo for laying the groundwork and Mr. Dave Latham for his invaluable knowledge and assistance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)