hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Demai Ni (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-9220) An API(and shell command) to list tables replicated TO the current cluster
Date Wed, 14 Aug 2013 21:29:47 GMT
Demai Ni created HBASE-9220:

             Summary: An API(and shell command) to list tables replicated TO the current cluster

                 Key: HBASE-9220
                 URL: https://issues.apache.org/jira/browse/HBASE-9220
             Project: HBase
          Issue Type: New Feature
          Components: Replication, shell
         Environment: clusters setup as Master and Slave for replication of tables
            Reporter: Demai Ni

This JIRA to track the continuous discussion following HBASE-8663, and hopefully surface a
better way to handle the use case: 

an administrator or developer,  who has 'list table' access to a cluster, would like to know
which tables/families are replicated to the cluster(i.e slave). so that he/she won't mess
things up.

While HBASE-8663 covered the API to get the list of tables and families from current cluster(i.e
Master). There is no conclusion on how to do the same for replicated tables TO the current
cluster(i.e slave). Several ideas have been entertained during HBASE-8663's discussion, and
summarized here: 

* *Idea 1*: on Slave cluster, use a new String attribute REPLICATION_MASTER to HColumnDescriptor
to indicate this column is replicated from it. A check can be added to ensure the value of
REPLICATION_MASTER is valid at the same of set. 
** problem 1) a slave can have more than one master(a minor one); 
** problem 2) the consistency is broken if the Master cluster 'remove_peer'(a major problem
which request a synchronous call to the remote master/peer cluster)

* *Idea 2*: reuse REPLICATION_SCOPE, and give a new meaning for value '-1'. If a table is
replicated to this cluster, its REPLICATION_SCOPE must be set to -1 before a replication can
** problem 1) incompatible change. Currently the slave side table will look just like normal
tables, the new change will request use to explicitly flag REPLICATION_SCOPE = -1
** problem 2) incompatible change. Currently any none-zero value of REPLICATION_SCOPE will
be treated as if its value of 1(global replication). the change will impact the existing tables
** problem 3) value '-1' only tell user that the table is replicated to current cluster, won't
be able to indicate the source/Master cluster

* *Idea 3*:  invent a new HColumnDescriptor attribute 'replication_peers', an array of ID.
We can use positive ID for target-cluster, and negative ID for source-cluster, for example

hbase(main):004:0> list_peers
 1 Slave_A.hbase.com:2181:/hbase ENABLED
 2 Slave_B.hbase.com:2181:/hbase ENABLED
 3 Slave_Master_C.hbase.com:2181:/hbase ENABLED
-1 Master_A.hbase.com:2181:/hbase ENABLED
-2 Master_B.hbase.com:2181:/hbase ENABLED
-3 Slave_Master_C.hbase.com:2181:/hbase ENABLED
>describe table
't1_dn', {NAME => 'cf1', REPLICATION_PEERS => '1,2,3', ..}
't2_dn', {NAME => 'cf1', REPLICATION_PEERS => '-1,-2',..}
't3_dn', {NAME => 'cf1', REPLICATION_PEERS => '3,-3',..}

t1_dn#cf1 is replicated from this cluster, and its slave clusters are Slave_A,Slave_B and
t2_dn#cf1 is replicated to this cluster, and its master clusters are Master_A and Master_B
t3_dn#cf1 is setup as Master_Slave replication, with Slave_Master_C.hbase.com(while don't
have to be the same cluster) 
** problem: similar as idea 1, and an improved version. A synchronous call can be implemented
through the peer ID

* *Idea 4*: Replication central controller that resides outside of all the clusters. The controller
will communicate with all clusters and keep info consistent, which can be a very good operational
manager for users who have 10+ clusters to oversee, and other features(such as backup/restore)
can leverage the framework
** problem: well, not really a problem per se, except the effort for the whole solution is
pretty large and need some clean up work. For example, currently 'add_peer' doesn't check
the value, and we need to fix that first; and replication setup rely on manually create table
on peer slave, we may like to ensure the same schema and do it automatically from Master cluster.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message