hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amit Kabra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-19105) Add ability to compare backups in HBase backups.
Date Thu, 02 Nov 2017 09:42:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-19105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16235462#comment-16235462

Amit Kabra commented on HBASE-19105:

> it is not possible to execute two identical backups on different clusters

If we can create time range buckets and put data into those buckets. Then we can compare data
of those buckets. Eg say we got data at time 2 and at time 8 and we have buckets as 0_5 and
5_10 then we will put 2 in 0_5 bucket and 8 in 5_10 bucket. When we compare data we will compare
only the corresponding buckets in primary and DR.

> they will always be slightly different due to the replication lag.

Replication delay - yes, that would be there but it will not be in days always, if that's
the case then that is anyway a issue. Considering acceptable delay of minutes / hours , we
can always compare sometime back data.

>  I would suggest doing backups only in DR cluster. In this case you will always have
single source of backed up data.

For site-switching primary and dr site should be in sync all the time so that we can do switch.
If we do backups only in DR, in worst scenario , what if DR goes down ? , we cannot initiate
new backups since that would not contain all the past data (deleted data, expired data, versions
beyond max versions ,etc ...)

Not saying its easy and straight forward though.

> Add ability to compare backups in HBase backups.
> ------------------------------------------------
>                 Key: HBASE-19105
>                 URL: https://issues.apache.org/jira/browse/HBASE-19105
>             Project: HBase
>          Issue Type: New Feature
>          Components: backup&restore
>            Reporter: Amit Kabra
>            Priority: Major
> For certain scenarios eg DR scenario, before making a site switch we need to ensure that
backups in primary and dr is same. Tool to compare the backups helps in such case that can
do cross cluster backups validation.
> Current backups generate data in backup_<timestamp> format and this can be different
in primary and dr and is not easily comparable.

This message was sent by Atlassian JIRA

View raw message