hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinsong Hu" <jinsong...@hotmail.com>
Subject hbase doesn't delete data older than TTL in old regions
Date Wed, 15 Sep 2010 16:54:31 GMT
I have tested the TTL for hbase and found that it relies on compaction to 
remove old data . However, if a region has data that is older
than TTL, and there is no trigger to compact it, then the data will remain 
there forever, wasting disk space and memory.

It appears at this state, to really remove data older than TTL we need to 
start a client side deletion request. This is really a pity because
it is an more expensive way to get the job done.  Another side effect of 
this is that as time goes on, we will end up with some small
regions if the data are saved in chronological order in regions. It appears 
that hbase doesn't have a mechanism to merge 2 consecutive
small regions into a bigger one at this time.  So if data is saved in 
chronological order, sooner or later we will run out of capacity , even if 
the amount of data in hbase is small, because we have lots of regions with 
small storage space.

A much cheaper way to remove data older than TTL would be to remember the 
latest timestamp for the region in the .META. table
and if the time is older than TTL, we just adjust the row in .META. and 
delete the store , without doing any compaction.

Can this be added to the hbase requirement for future release ?


View raw message