hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Latham (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13639) SyncTable - rsync for HBase tables
Date Thu, 04 Jun 2015 18:22:38 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573319#comment-14573319
] 

Dave Latham commented on HBASE-13639:
-------------------------------------

We've used this tool to repair some very large tables across a WAN link.  It can be challenging
to run against a table getting live writes, if those writes are updates/overwrites.  In general,
you can run it against a time range to ignore new writes, but if those writes update existing
cells, then the time range scan may or may not see older versions of those cells depending
on whether major compaction has happened, which may be different in remote clusters.

> SyncTable - rsync for HBase tables
> ----------------------------------
>
>                 Key: HBASE-13639
>                 URL: https://issues.apache.org/jira/browse/HBASE-13639
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 2.0.0, 0.98.14, 1.2.0
>
>         Attachments: HBASE-13639-0.98.patch, HBASE-13639-v1.patch, HBASE-13639-v2.patch,
HBASE-13639-v3.patch, HBASE-13639.patch
>
>
> Given HBase tables in remote clusters with similar but not identical data, efficiently
update a target table such that the data in question is identical to a source table.  Efficiency
in this context means using far less network traffic than would be required to ship all the
data from one cluster to the other.  Takes inspiration from rsync.
> Design doc: https://docs.google.com/document/d/1-2c9kJEWNrXf5V4q_wBcoIXfdchN7Pxvxv1IO6PW0-U/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message