hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timothy Brown <...@siftscience.com>
Subject Re: Deleting and cleaning old snapshots exported to S3
Date Wed, 22 Nov 2017 20:52:32 GMT
Hi Lex,

We had a similar issue with our S3 bucket growing in size and we wrote our
own cleaner. The cleaner first looks at the HFiles required by the current
snapshots. We then figure out which snapshots we no longer want (for
example snapshots older than a week or whatever rules you want). Then we
find the HFiles that are only referenced by these unwanted snapshots and
delete these HFiles from S3.

The tricky part is finding the HFiles for a given snapshot. There are two
ways to this.

1) Use:

SnapshotDescription snapshotDesc =
SnapshotDescriptionUtils.readSnapshotInfo(fs, snapshotDir);
SnapshotReferenceUtil.visitReferencedFiles(conf, fs, snapshotDir,
snapshotDesc, snapshotVisitor)

where snapshotVisitor is an implementation of the SnapshotVisitor interface
found here:
https://github.com/cloudera/hbase/blob/cdh5-1.2.0_5.11.1/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotReferenceUtil.java#L63

2) The ExportSnapshot class provides a private method that does this for
you. We ended up using reflection to make ExportSnapshot#getSnapshotFiles
public (see
https://github.com/cloudera/hbase/blob/cdh5-1.2.0_5.11.1/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java#L539).
For example:

Path snapshotPath = getCompletedSnapshotDir(snapshotName, rootDir);
Method method = ExportSnapshot.class.getDeclaredMethod("getSnapshotFiles",
    Configuration.class, FileSystem.class, Path.class);
method.setAccessible(true);
List<Pair<SnapshotFileInfo, Long>> snapshotFiles = method.invoke(null,
conf, fs, snapshotPath);

I would love to know how other people are tackling this issue as well.

-Tim

On Mon, Nov 20, 2017 at 7:45 PM, Lex Toumbourou <lex@scrunch.com> wrote:

> Hi all,
>
> Wondering if I could get some help figuring out how to clean out old
> snapshots that have been exported to S3?
>
> We've been exporting snapshots to S3 using the export snapshot command:
>
> bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
> some-snapshot -copy-to s3a://some-bucket/hbase
>
>
> Now the size of the S3 bucket is getting a little out of control and I'd
> like to remove the old snapshots and let HBase garbage collect blocks no
> longer referenced.
>
> One idea I had was to spin up an entirely new cluster that uses the S3
> bucket as the hbase.rootdir then just delete the snapshots as normal and
> maybe use cleaner_run to clean up the old files but it feels like overkill
> having to spin up an entire cluster.
>
> So my question is: what's the best approach for deleting snapshots exported
> to an s3 bucket and cleaning old store files no longer referenced? We are
> using HBase 1.3.1 on EMR.
>
> Thanks!
>
> Lex ToumbourouCTO at scrunch.com <http://scrunch.com/>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message