hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amit Kabra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-19104) Add filtering during restore in HBase backups.
Date Thu, 02 Nov 2017 09:03:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16235425#comment-16235425
] 

Amit Kabra commented on HBASE-19104:
------------------------------------

Yes, time range is important but  there can be more important filtering we can do to restore
data instead of restoring everything.

Most important being the tenant name or tenant id. So whenever restore is triggered, row is
parsed / checked for the tenant id and if tenant id is found then only its restored in the
restore table otherwise not.

Similarly , we should be able to pass particular backup directory in hdfs or particular hfile
path to restore only that much data. This can help in 2 ways:
1) Debugging cases in production where we suspect issue with particular backup and we can
restore that part only and check its validity instead of restoring everything.
2) Can help in doing backups self validation post backups (HBASE-19106) , where we can restore
a part of it using these filters and validate backups.

When we get large amount of data in production these filters help a lot.

> Add filtering during restore in HBase backups.
> ----------------------------------------------
>
>                 Key: HBASE-19104
>                 URL: https://issues.apache.org/jira/browse/HBASE-19104
>             Project: HBase
>          Issue Type: New Feature
>          Components: backup&restore
>            Reporter: Amit Kabra
>            Priority: Major
>             Fix For: 2.1.0
>
>
> When we deal with large amount of data, it would be great , if we can do data restore
from backups based on tenant , based on time range , etc , so that if finishes faster and
we restore only what's required.
> Currently restore take backup id as input and restore all the data will that backup id
time stamp. We may not need to restore all data in a given backup id.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message