hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Deleting and cleaning old snapshots exported to S3
Date Wed, 22 Nov 2017 22:03:03 GMT
Logged HBASE-19333.


On Wed, Nov 22, 2017 at 1:11 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> For getSnapshotFiles, it returns protobuf class. That was why it is
> private.
>
> If we create POJO class for SnapshotFileInfo which is returned, I think
> the method can become public.
>
> Cheers
>
> -------- Original message --------
> From: Timothy Brown <tim@siftscience.com>
> Date: 11/22/17 12:52 PM (GMT-08:00)
> To: user@hbase.apache.org
> Subject: Re: Deleting and cleaning old snapshots exported to S3
>
> Hi Lex,
>
> We had a similar issue with our S3 bucket growing in size and we wrote our
> own cleaner. The cleaner first looks at the HFiles required by the current
> snapshots. We then figure out which snapshots we no longer want (for
> example snapshots older than a week or whatever rules you want). Then we
> find the HFiles that are only referenced by these unwanted snapshots and
> delete these HFiles from S3.
>
> The tricky part is finding the HFiles for a given snapshot. There are two
> ways to this.
>
> 1) Use:
>
> SnapshotDescription snapshotDesc =
> SnapshotDescriptionUtils.readSnapshotInfo(fs, snapshotDir);
> SnapshotReferenceUtil.visitReferencedFiles(conf, fs, snapshotDir,
> snapshotDesc, snapshotVisitor)
>
> where snapshotVisitor is an implementation of the SnapshotVisitor interface
> found here:
> https://github.com/cloudera/hbase/blob/cdh5-1.2.0_5.11.1/
> hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/
> SnapshotReferenceUtil.java#L63
>
> 2) The ExportSnapshot class provides a private method that does this for
> you. We ended up using reflection to make ExportSnapshot#getSnapshotFiles
> public (see
> https://github.com/cloudera/hbase/blob/cdh5-1.2.0_5.11.1/
> hbase-server/src/main/java/org/apache/hadoop/hbase/
> snapshot/ExportSnapshot.java#L539).
> For example:
>
> Path snapshotPath = getCompletedSnapshotDir(snapshotName, rootDir);
> Method method = ExportSnapshot.class.getDeclaredMethod("getSnapshotFiles",
>     Configuration.class, FileSystem.class, Path.class);
> method.setAccessible(true);
> List<Pair<SnapshotFileInfo, Long>> snapshotFiles = method.invoke(null,
> conf, fs, snapshotPath);
>
> I would love to know how other people are tackling this issue as well.
>
> -Tim
>
> On Mon, Nov 20, 2017 at 7:45 PM, Lex Toumbourou <lex@scrunch.com> wrote:
>
> > Hi all,
> >
> > Wondering if I could get some help figuring out how to clean out old
> > snapshots that have been exported to S3?
> >
> > We've been exporting snapshots to S3 using the export snapshot command:
> >
> > bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
> > some-snapshot -copy-to s3a://some-bucket/hbase
> >
> >
> > Now the size of the S3 bucket is getting a little out of control and I'd
> > like to remove the old snapshots and let HBase garbage collect blocks no
> > longer referenced.
> >
> > One idea I had was to spin up an entirely new cluster that uses the S3
> > bucket as the hbase.rootdir then just delete the snapshots as normal and
> > maybe use cleaner_run to clean up the old files but it feels like
> overkill
> > having to spin up an entire cluster.
> >
> > So my question is: what's the best approach for deleting snapshots
> exported
> > to an s3 bucket and cleaning old store files no longer referenced? We are
> > using HBase 1.3.1 on EMR.
> >
> > Thanks!
> >
> > Lex ToumbourouCTO at scrunch.com <http://scrunch.com/>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message