hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Optimal setup for regular purging of old rows
Date Thu, 10 Mar 2011 04:04:24 GMT

For some reason there are suddenly lots of questions about purging old data.  
I'm looking at the same thing and was wondering:

* In my case, the same table is shared by multiple users, each of which may have 
a different data retention policy.  Thus, I think I need to look at each and 
every row and check if it's considered "expired" and thus ready for deletion.  
Ideally, I'd associate a TTL when I Put a row and HBase would automagically 
remove it when its time is up, but I don't think TTLs per row are doable, and 
neither is automagical expiration, right?

* Is the only option to have a column with the expiration timestamp, and have a 
nightly MR job that does a full table scan and purges all expired rows?  
Wouldn't that be *super* costly because *all* data would have to be read from 
disk just for this one thing?  And this would evict all good stuff from the OS 
cache (and maybe block cache and memstore?)  Is there a better way?

* Are there specific recommendations for how to define tables to be able  to 
efficiently remove batches of rows on a regular basis?

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

View raw message