hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashish Singhi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13153) enable bulkload to support replication
Date Mon, 31 Aug 2015 11:23:45 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723309#comment-14723309

Ashish Singhi commented on HBASE-13153:

[~lhofhansl], thanks for the review and comments.

bq. What if that notification is missed? For example the RS dies just then? WAL replication
does not have this issue since it always deals with all existing WALs so it cannot miss anything.
After loading the hfile successfully, we will notify and then return to the client. So if
RS dies then complete bulk load will fail and client has to retry.

bq. So you'll send the HFile over RPCs? These files can be huge. Can we use HDFS' distCP here?
No, we will send only the path of HFiles in the source cluster.

bq. Can we simply use the standard bulk load mechanism here? It would split the files as necessary.
Yes, we plan to use complete bulk load tool mechanism where in peer cluster will act as complete
bulk load client.

bq. You'll need to ensure this somehow.
Plan is we will have our own implementation of BaseLogCleanerDelegate#getDeletableFiles to
ensure this.

bq. That can lead to very tricky issues where the same files just go from cluster to cluster
in a never ending cycle. We know at the source that the HFiles came from a bulk load, maybe
we can handle that specially.
Cyclic replication is a limitation as of now. We are still thinking how we can handle this.

bq. Lastly, it might be generally a good option to copy HFiles around, rather than WALs (at
least for some setups). Could we use this to do that?
The design will support this. Currently we are adding this in to bulk load. If required we
can extend this hook.

> enable bulkload to support replication
> --------------------------------------
>                 Key: HBASE-13153
>                 URL: https://issues.apache.org/jira/browse/HBASE-13153
>             Project: HBase
>          Issue Type: New Feature
>          Components: Replication
>            Reporter: sunhaitao
>            Assignee: Ashish Singhi
>             Fix For: 2.0.0
>         Attachments: HBase Bulk Load Replication.pdf
> Currently we plan to use HBase Replication feature to deal with disaster tolerance scenario.But
we encounter an issue that we will use bulkload very frequently,because bulkload bypass write
path, and will not generate WAL, so the data will not be replicated to backup cluster. It's
inappropriate to bukload twice both on active cluster and backup cluster. So i advise do some
modification to bulkload feature to enable bukload to both active cluster and backup cluster

This message was sent by Atlassian JIRA

View raw message