hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Appy (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
Date Fri, 01 Dec 2017 04:00:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16273901#comment-16273901
] 

Appy edited comment on HBASE-17852 at 12/1/17 3:59 AM:
-------------------------------------------------------

Few questions:
Pardon me if my high level analysis of design is off. Is following correct description of
current design?
Start bulkload from client -> each RS gets its RPC for prepare and then do the actual bulkload
--> Internally when bulk load is done,BackupObserver#postBulkLoadHFile writes paths to
backup table.
And to avoid full backup failures from affecting incremental backups (due to snapshot restore),
you are putting bulk loaded paths data in a separate table, right?

----
There were concerns above on cross RS rpc to write the paths, I was trying to think of easiest
way of avoiding that. How about returning the [map as part of response here|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java#L2251]
and then issue rpc to master from client side. It's easy and safer to retry from client side
if remote resource isn't available.
I'd suggest going extra step, an easy one though -  collect all paths on client side and do
single put request. That'll give two benefits:
- Will make it transactional incremental backup
- If put fails repeatedly, you can either fail bulk load altogether, or throw error to user
telling that these bulk loaded files failed to backup and that only full backup will include
them. 
----
What happens if during an ongoing backup, i create some backup sets, but then the backup fails?
Snapshot restore will remove my backup sets?


was (Author: appy):
Few questions:
Pardon me if my high level analysis of design is off. Is following correct description of
current design?
Start bulkload from client -> each RS gets its RPC for prepare and then do the actual bulkload
--> Internally when bulk load is done,BackupObserver#postBulkLoadHFile writes paths to
backup table.
And to avoid full backup failures from affecting incremental backups (due to snapshot restore),
you are putting bulk loaded paths data in a separate table, right?
----
There were concerns above on cross RS rpc to write the paths, I was trying to think of easiest
way of avoiding that. How about returning the [map as part of response here|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java#L2251]
and then issue rpc to master from client side. It's easy and safer to retry from client side
if remote resource isn't available.
I'd suggest going extra step, an easy one though -  collect all paths on client side and do
single put request. That'll give two benefits:
- Will make it transactional incremental backup
- If put fails repeatedly, you can either fail bulk load altogether, or throw error to user
telling that these bulk loaded files failed to backup and that only full backup will include
them. 
----
What happens if during an ongoing backup, i create some backup sets, but then the backup fails?
Snapshot restore will remove my backup sets?

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-17852
>                 URL: https://issues.apache.org/jira/browse/HBASE-17852
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>             Fix For: 2.0.0
>
>         Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, HBASE-17852-v3.patch,
HBASE-17852-v4.patch, HBASE-17852-v5.patch, HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch,
HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup meta-table
(backup system table). This procedure is lightweight because meta table is small, usually
should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning up partial
data in backup destination, followed by restoring backup meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), next time
user will try create/merge/delete he(she) will see error message, that system is in inconsistent
state and repair is required, he(she) will need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and BackupObserver's)
we introduce small table ONLY to keep listing of bulk loaded files. All backup observers will
work only with this new tables. The reason: in case of a failure during backup create/delete/merge/restore,
when system performs automatic rollback, some data written by backup observers during failed
operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about consistency
of this table, because bulk load is idempotent operation and can be repeated after failure.
Partially written data in second table does not affect on BackupHFileCleaner plugin, because
this data (list of bulk loaded files) correspond to a files which have not been loaded yet
successfully and, hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message