hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeffrey Zhong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9360) Enable 0.94 -> 0.96 replication to minimize upgrade down time
Date Tue, 12 Nov 2013 17:48:19 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820278#comment-13820278

Jeffrey Zhong commented on HBASE-9360:

Thanks [~jmspaggi]! Very nice of you to try it out. For your question:
if writes are happening in 0.94, how are your perfectly sure of the start time of the replication?
This is same as normal replication setup. After turning on replication for all tables in source
0.94 cluster, let's say the timestamp is T1. When you export data from source 0.94 cluster,
you can export data till T1+15secs(some buffer to overlap time clock drift). It means the
replication and import have a small time window overlap which guarantees all the data are
copied over. The overlap is safe because replication & import only use puts & deletes,
which are idempotent, to copy data. 

> Enable 0.94 -> 0.96 replication to minimize upgrade down time
> -------------------------------------------------------------
>                 Key: HBASE-9360
>                 URL: https://issues.apache.org/jira/browse/HBASE-9360
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: migration
>    Affects Versions: 0.98.0, 0.96.0
>            Reporter: Jeffrey Zhong
> As we know 0.96 is a singularity release, as of today a 0.94 hbase user has to do in-place
upgrade: make corresponding client changes, recompile client application code, fully shut
down existing 0.94 hbase cluster, deploy 0.96 binary, run upgrade script and then start the
upgraded cluster. You can image the down time will be extended if something is wrong in between.

> To minimize the down time, another possible way is to setup a secondary 0.96 cluster
and then setup replication between the existing 0.94 cluster and the new 0.96 slave cluster.
Once the 0.96 cluster is synced, a user can switch the traffic to the 0.96 cluster and decommission
the old one.
> The ideal steps will be:
> 1) Setup a 0.96 cluster
> 2) Setup replication between a running 0.94 cluster to the newly created 0.96 cluster
> 3) Wait till they're in sync in replication
> 4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop relocation now)
> 5) Forward read traffic to the slave 0.96 cluster
> 6) After a certain period, stop writes to the original 0.94 cluster if everything is
good and completes upgrade
> To get us there, there are two tasks:
> 1) Enable replication from 0.94 -> 0.96
> I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it seems the
best approach is to build a very similar service or on top of https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep
with support three commands replicateLogEntries, multi and delete. Inside the three commands,
we just pass down the corresponding requests to the destination 0.96 cluster as a bridge.
The reason to support the multi and delete is for CopyTable to copy data from a 0.94 cluster
to a 0.96 one.
> The other approach is to provide limited support of 0.94 RPC protocol in 0.96. While
an issue on this is that a 0.94 client needs to talk to zookeeper firstly before it can connect
to a 0.96 region server. Therefore, we need a faked Zookeeper setup in front of a 0.96 cluster
for a 0.94 client to connect. It may also pollute 0.96 code base with 0.94 RPC code.
> 2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need to load both
hbase clients into one single JVM using different class loader.
> Let me know if you think this is worth to do and any better approach we could take.
> Thanks!

This message was sent by Atlassian JIRA

View raw message