hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Demai Ni (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9047) Tool to handle finishing replication when the cluster is offline
Date Sun, 18 Aug 2013 18:30:49 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13743322#comment-13743322
] 

Demai Ni commented on HBASE-9047:
---------------------------------

folks, 

May I ask for your suggestions on a couple questions? and all comments on this prototype code
are greatly appreciated.

first question, I put this standalone tool under "package org.apache.hadoop.hbase.util.hbck",
so it can be run on cli using "$ hbase org.apache.hadoop.hbase.util.hbck.ReplicationSyncUp".
is this the right place? 

2nd question, I am using HConnection#getZooKeeperWatcher(), which is deprecated. I asked the
question for alternative route through 'dev list'. Thanks for the comments and suggestions
there. Also I'd like to provide the whole picture here. Basically, I'd like to access the
zookeeper while hbase is offline; then use a dummy server to grab all the /hbase/replication/rs
info from the original zookeeper to the new dummy server; and a replicationManager will be
inited and do its magic. So the question is how to get the original zookeeper info. 

sorry that the prototype code is kind of messy, so I summarize the code flow below for easier
read 
{code}
/**
 * hbase org.apache.hadoop.hbase.util.hbck.ReplicationSyncUp
 */
public class ReplicationSyncUp {

  public static void main(String[] args) throws Exception {
...
    conf = HBaseConfiguration.create();
    connection = HConnectionManager.getConnection(conf);
    // the method is deprecate on 0.94, but the replacement is not backported to 0.94.
    // so have to use it for now.
    zkw = connection.getZooKeeperWatcher();
...
    replicationZK = new ReplicationZookeeper(connection, conf, zkw);
...
    replication = new Replication(new DummyServer(), fs, logDir, oldLogDir);
    manager = replication.getReplicationManager();
....
    List<WALActionsListener> listeners = new ArrayList<WALActionsListener>();
    listeners.add(replication);
....
    manager.init(); // the magic happens here
.... // grab the /hbase/replication/rs from original zookeeper
.... // put under the dummy server
.... // push the Edits to slave cluster
    
    // tear down 
    manager.join();
 }
 static class DummyServer implements Server {
    String hostname;

    DummyServer() {
      // an unique name in case the first run fails
      hostname = System.currentTimeMillis() + ".SyncUpTool.replication.org";
    }
  ...
  }
}

{code} 


                
> Tool to handle finishing replication when the cluster is offline
> ----------------------------------------------------------------
>
>                 Key: HBASE-9047
>                 URL: https://issues.apache.org/jira/browse/HBASE-9047
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Jean-Daniel Cryans
>            Assignee: Demai Ni
>         Attachments: hbase-9047-0.94.9-v0
>
>
> We're having a discussion on the mailing list about replicating the data on a cluster
that was shut down in an offline fashion. The motivation could be that you don't want to bring
HBase back up but still need that data on the slave.
> So I have this idea of a tool that would be running on the master cluster while it is
down, although it could also run at any time. Basically it would be able to read the replication
state of each master region server, finish replicating what's missing to all the slave, and
then clear that state in zookeeper.
> The code that handles replication does most of that already, see ReplicationSourceManager
and ReplicationSource. Basically when ReplicationSourceManager.init() is called, it will check
all the queues in ZK and try to grab those that aren't attached to a region server. If the
whole cluster is down, it will grab all of them.
> The beautiful thing here is that you could start that tool on all your machines and the
load will be spread out, but that might not be a big concern if replication wasn't lagging
since it would take a few seconds to finish replicating the missing data for each region server.
> I'm guessing when starting ReplicationSourceManager you'd give it a fake region server
ID, and you'd tell it not to start its own source.
> FWIW the main difference in how replication is handled between Apache's HBase and Facebook's
is that the latter is always done separately of HBase itself. This jira isn't about doing
that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message