hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jianwei Cui (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15469) Take snapshot by family
Date Mon, 21 Mar 2016 08:10:25 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203878#comment-15203878

Jianwei Cui commented on HBASE-15469:

For our case, the goal is to copy existed data for given families and clone the snapshot,
so that creating a new table with only the subset families is a better choice. For the restore
case, the goal is to rollback the table to some history state? the snapshot with only a subset
of families may not represent any history state of the table, so that should not be used for
the restore purpose.
we may block the restore of snapshots with only a subset of families. and that will solve
the strange situation of restore. 
and when we clone we just create a new table with only the subset. In theory this is more
clear for the end user. 
Agreed with your analysis [~mbertozzi], and also expect other opinions and cases. Thanks!

> Take snapshot by family
> -----------------------
>                 Key: HBASE-15469
>                 URL: https://issues.apache.org/jira/browse/HBASE-15469
>             Project: HBase
>          Issue Type: Improvement
>          Components: snapshots
>    Affects Versions: 2.0.0
>            Reporter: Jianwei Cui
>         Attachments: HBASE-15469-v1.patch, HBASE-15469-v2.patch
> In our production environment, there are some 'wide' tables in offline cluster. The 'wide'
table has a number of families, different applications will access different families of the
table through MapReduce. When some application starting to provide online service, we need
to copy needed families from offline cluster to online cluster. For future write, the inter-cluster
replication supports setting families for table, we can use it to copy future edits for needed
families. For existed data, we can take snapshot of the table on offline cluster, then exploit
{{ExportSnapshot}} to copy snapshot to online cluster and clone the snapshot. However, we
can only take snapshot for the whole table in which many families are not needed for the application,
this will lead unnecessary data copy. I think it is useful to support taking snapshot by family,
so that we can only copy needed data.
> Possible solution to support such function:
> 1. Add family names field to the protobuf definition of {{SnapshotDescription}}
> 2. Allow to set families when taking snapshot in hbase shell, such as:
> {code}
>    snapshot 'tableName', 'snapshotName', 'FamilyA', 'FamilyB', {SKIP_FLUSH => true}
> {code}
> 3. Add family names to {{SnapshotDescription}} in client side
> 4. Read family names from {{SnapshotDescription}} in Master/Regionserver, keep only requested
families when taking snapshot for region.
> Discussions and suggestions are welcomed.

This message was sent by Atlassian JIRA

View raw message