hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Feng Honghua (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8751) Enable peer cluster to choose/change the ColumnFamilies/Tables it really want to replicate from a source cluster
Date Mon, 16 Sep 2013 23:44:52 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13768963#comment-13768963
] 

Feng Honghua commented on HBASE-8751:
-------------------------------------

[~jdcryans]

Thanks for the thorough code review, but below is not true:

bq. This is in ReplicationSource.removeNonReplicableEdits() and that method is called for
each HLog.Entry, which means that you'd hit ZK from all the region servers for as many write
calls as they are getting. That seems excessive.

==> zkHelper.getTableCFs(peerId) delegates to ReplicationPeer.getTableCFs, and ReplicationPeer
maintains the current table/cf configs in its tableCFs field and returns it per getTableCFs
call. And ReplicationPeer has a tableCFTracker which is watching tableCF zk node and updates
tableCFs field accordingly once tableCF zk node is changed(by user via shell). This process
is similiar to the peer state(enable/disable) treatment.
    So tableCF zk node will be access same times as it's updated, not same times ReplicationSource.removeNonReplicableEdits()
is called (for each HLog.Entry)
                
> Enable peer cluster to choose/change the ColumnFamilies/Tables it really want to replicate
from a source cluster
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-8751
>                 URL: https://issues.apache.org/jira/browse/HBASE-8751
>             Project: HBase
>          Issue Type: Improvement
>          Components: Replication
>            Reporter: Feng Honghua
>         Attachments: HBASE-8751-0.94-V0.patch
>
>
> Consider scenarios (all cf are with replication-scope=1):
> 1) cluster S has 3 tables, table A has cfA,cfB, table B has cfX,cfY, table C has cf1,cf2.
> 2) cluster X wants to replicate table A : cfA, table B : cfX and table C from cluster
S.
> 3) cluster Y wants to replicate table B : cfY, table C : cf2 from cluster S.
> Current replication implementation can't achieve this since it'll push the data of all
the replicatable column-families from cluster S to all its peers, X/Y in this scenario.
> This improvement provides a fine-grained replication theme which enable peer cluster
to choose the column-families/tables they really want from the source cluster:
> A). Set the table:cf-list for a peer when addPeer:
>   hbase-shell> add_peer '3', "zk:1100:/hbase", "table1; table2:cf1,cf2; table3:cf2"
> B). View the table:cf-list config for a peer using show_peer_tableCFs:
>   hbase-shell> show_peer_tableCFs "1"
> C). Change/set the table:cf-list for a peer using set_peer_tableCFs:
>   hbase-shell> set_peer_tableCFs '2', "table1:cfX; table2:cf1; table3:cf1,cf2"
> In this theme, replication-scope=1 only means a column-family CAN be replicated to other
clusters, but only the 'table:cf-list list' determines WHICH cf/table will actually be replicated
to a specific peer.
> To provide back-compatibility, empty 'table:cf-list list' will replicate all replicatable
cf/table. (this means we don't allow a peer which replicates nothing from a source cluster,
we think it's reasonable: if replicating nothing why bother adding a peer?)
> This improvement addresses the exact problem raised  by the first FAQ in "http://hbase.apache.org/replication.html":
>   "GLOBAL means replicate? Any provision to replicate only to cluster X and not to cluster
Y? or is that for later?
>   Yes, this is for much later."
> I also noticed somebody mentioned "replication-scope" as integer rather than a boolean
is for such fine-grained replication purpose, but I think extending "replication-scope" can't
achieve the same replication granularity flexibility as providing above per-peer replication
configurations.
> This improvement has been running smoothly in our production clusters (Xiaomi) for several
months.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message