hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14417) Incremental backup and bulk loading
Date Thu, 08 Dec 2016 15:34:59 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15732531#comment-15732531

Ted Yu commented on HBASE-14417:

More response to Vlad's review comment w.r.t. fault tolerance in bulk load.

When bulk load fails midway, the user should provide complete set of hfiles again because
the staging directory is not exposed to end users.
With this in mind, the benefit of using another hook (prior to postBulkLoadHFile()) to persist
location of bulk loaded hfiles is minimal - since in subsequent bulk load attempt(s), the
same set of (source) hfiles would be loaded again.

Another factor is that the more writes to hbase:backup table, the higher the chance of getting
(write) failure.

One optimization we can do in the future is to combine writes (performed in postBulkLoadHFile())
from several regions on the same region server, provided that these writes are sufficiently
close (300 ms apart, e.g.). The completion of bulk load on a single region server is determined
by the slowest participating region, so this optimization would keep the response time on
par with the current implementation (where hbase:backup table is not involed).

> Incremental backup and bulk loading
> -----------------------------------
>                 Key: HBASE-14417
>                 URL: https://issues.apache.org/jira/browse/HBASE-14417
>             Project: HBase
>          Issue Type: New Feature
>    Affects Versions: 2.0.0
>            Reporter: Vladimir Rodionov
>            Assignee: Ted Yu
>            Priority: Critical
>              Labels: backup
>             Fix For: 2.0.0
>         Attachments: 14417-tbl-ext.v10.txt, 14417-tbl-ext.v9.txt, 14417.v1.txt, 14417.v11.txt,
14417.v13.txt, 14417.v2.txt, 14417.v21.txt, 14417.v23.txt, 14417.v24.txt, 14417.v25.txt, 14417.v6.txt
> Currently, incremental backup is based on WAL files. Bulk data loading bypasses WALs
for obvious reasons, breaking incremental backups. The only way to continue backups after
bulk loading is to create new full backup of a table. This may not be feasible for customers
who do bulk loading regularly (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE

This message was sent by Atlassian JIRA

View raw message