hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13639) SyncTable - rsync for HBase tables
Date Thu, 14 May 2015 22:32:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544488#comment-14544488
] 

Andrew Purtell commented on HBASE-13639:
----------------------------------------

+1

I tested this with two small clusters:

# Use LTT to initialize test tables on each cluster
# Use LTT to write 100000 rows starting from key 0 to cluster 1
# Run HashTable on cluster 1 writing hashes to cluster 2
# Run SyncTable on cluster 2 with sourcezkcluster=cluster 1
# Check row count on cluster 2, 100000 as expected
# Use LTT to write 100000 rows starting from key 100000 to cluster 1
# Run HashTable on cluster 1 writing hashes to cluster 2
# Run SyncTable on cluster 2 with sourcezkcluster=cluster 1
# Check row count on cluster 2, 200000 as expected
# Run LTT to update 20% of cells on cluster 1
# Run HashTable on cluster 1 writing hashes to cluster 2
# Run SyncTable on cluster 2 with sourcezkcluster=cluster 1
# Observed SyncTable pull updates from cluster 1 to cluster 2 and write back "old" cells from
cluster 2 to cluster 1. Row count didn't change. 

> SyncTable - rsync for HBase tables
> ----------------------------------
>
>                 Key: HBASE-13639
>                 URL: https://issues.apache.org/jira/browse/HBASE-13639
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 2.0.0, 0.98.14, 1.2.0
>
>         Attachments: HBASE-13639-0.98.patch, HBASE-13639-v1.patch, HBASE-13639-v2.patch,
HBASE-13639.patch
>
>
> Given HBase tables in remote clusters with similar but not identical data, efficiently
update a target table such that the data in question is identical to a source table.  Efficiency
in this context means using far less network traffic than would be required to ship all the
data from one cluster to the other.  Takes inspiration from rsync.
> Design doc: https://docs.google.com/document/d/1-2c9kJEWNrXf5V4q_wBcoIXfdchN7Pxvxv1IO6PW0-U/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message