hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frédéric Fondement <frederic.fondem...@uha.fr>
Subject TTL performance
Date Thu, 21 Jun 2012 14:19:12 GMT
Hi all !

Before I start, I'd like to have some feedback about TTL performance in 

My use case is the following. I have constantly data coming in the base 
(i.e. a write-instensive application). This data should be kept during a 
certain amount of time, either 3, 6, 12... monthes, depending on some 
external conditions. I can live with some data registered to live only 3 
monthes even if conditions eventually change to 6 months.

I can see three options here:

     opt. 1: indexing in a secondary table using salted timestamp as a 
key (this is not a problem in my case)
     opt. 2: creating different tables like 
'to-be-destroyed-in-august-2012', 'to be destroyed-in-june-2012'... and 
then merely killing them with a cron job
     opt. 3: creating tables like 'to-be-destroyed-in-3-monthes' (with a 
3 monthes TTL), 'to-be-destroyed-in-6-monthes' (with a 6 monthes TTL)...

What do you think is the most efficient ?
opt1. overloads a little bit more my already write intensive context
opt2. looks nice (regarding deletion), but to read, I need to scan at 
least 12 different tables, and each month, my data will be buffered 
during table creation (and region splitting ! which I still don't really 
know how to choose split keys)
opt3. looks the nicest (only 3-4 tables to scan when reading), but won't 
my daily major compact become crazy ?

It would be great having some clue before doing the job :-) !

Best regards,


View raw message