hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: hbase doesn't delete data older than TTL in old regions
Date Wed, 15 Sep 2010 17:08:13 GMT
On Wed, Sep 15, 2010 at 9:54 AM, Jinsong Hu <jinsong_hu@hotmail.com> wrote:
> I have tested the TTL for hbase and found that it relies on compaction to
> remove old data . However, if a region has data that is older
> than TTL, and there is no trigger to compact it, then the data will remain
> there forever, wasting disk space and memory.

So its working as advertised then?

There's currently an issue where we can skip major compactions if your
write loading has a particular character: hbase-2990.

> It appears at this state, to really remove data older than TTL we need to
> start a client side deletion request.

Or run a manual major compaction:

$ echo "major_compact TABLENAME" | ./bin/hbase shell

 This is really a pity because
> it is an more expensive way to get the job done.  Another side effect of
> this is that as time goes on, we will end up with some small
> regions if the data are saved in chronological order in regions. It appears
> that hbase doesn't have a mechanism to merge 2 consecutive
> small regions into a bigger one at this time.

$ ./bin/hbase org.apache.hadoop.hbase.util.Merge
Usage: bin/hbase merge <table-name> <region-1> <region-2>

Currently only works on offlined table but there's a patch available
to make it run against onlined regions.

So if data is saved in
> chronological order, sooner or later we will run out of capacity , even if
> the amount of data in hbase is small, because we have lots of regions with
> small storage space.
> A much cheaper way to remove data older than TTL would be to remember the
> latest timestamp for the region in the .META. table
> and if the time is older than TTL, we just adjust the row in .META. and
> delete the store , without doing any compaction.

Say more on the above.  It sounds promising.  Are you suggesting that
in addition to compactions that we also have a provision where we keep
account of a storefiles latest timestamp (we already do this I
believe) and that when now - storefile-timestamp > ttl, we just remove
the storefile wholesale.  That sounds like it could work, if that is
what you are suggesting.  Mind filing an issue w/ a detailed


> Can this be added to the hbase requirement for future release ?
> Jimmy

View raw message