hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13153) enable bulkload to support replication
Date Thu, 27 Aug 2015 10:17:47 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716421#comment-14716421

Ted Yu commented on HBASE-13153:

w.r.t. HFileReplicationEndPoint :
bq. After every configurable interval or max request size limit
Can you describe how the max request size limit would be monitored ?

bq. Peer cluster RS will receive the RPC request having multiple hfile paths
HFile paths are in ZK. Do we need to send the paths in RPC ?

bq. Peer RS will send the response with Success OR Failure paths
The response can be sent before HFile splitting is completed, right ?

bq. Inside hfiles node, there will be children node for every bulk loaded hfile name and hfile
path as its data.
Could there be collision between HFile names ?

bq. Once the limit is reached, the new entries will not be queued.
This constraint is due to the limit on amount of data that can be stored in ZK. Have you thought
of introducing a system table for recording information w.r.t. HFiles to be replicated ?

bq. During Scan there will not be any matching entry corresponding to “1” in Peer cluster
Visibility Tables. 
Index for visibility table entry could be different in peer cluster. Should visibility labels
be rewritten during the replication ?

bq. if again replicated from cluster-2 to active cluster, it will be accepted.
Could sequence Id be used so that the HFiles don't need to be written again ?

> enable bulkload to support replication
> --------------------------------------
>                 Key: HBASE-13153
>                 URL: https://issues.apache.org/jira/browse/HBASE-13153
>             Project: HBase
>          Issue Type: Bug
>          Components: API
>            Reporter: sunhaitao
>            Assignee: Ashish Singhi
>         Attachments: HBase Bulk Load Replication.pdf
> Currently we plan to use HBase Replication feature to deal with disaster tolerance scenario.But
we encounter an issue that we will use bulkload very frequently,because bulkload bypass write
path, and will not generate WAL, so the data will not be replicated to backup cluster. It's
inappropriate to bukload twice both on active cluster and backup cluster. So i advise do some
modification to bulkload feature to enable bukload to both active cluster and backup cluster

This message was sent by Atlassian JIRA

View raw message