hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: TTL performance
Date Thu, 21 Jun 2012 19:42:14 GMT
Hi Frédéric,

Have you looked at http://hbase.apache.org/book/versions.html ? What
you want to do, if I undesrtand correctly, is already part of the
hbase features... This: http://outerthought.org/blog/417-ot.html can
be interesting too.


2012/6/21, Frédéric Fondement <frederic.fondement@uha.fr>:
> Hi all !
> Before I start, I'd like to have some feedback about TTL performance in
> HBase.
> My use case is the following. I have constantly data coming in the base
> (i.e. a write-instensive application). This data should be kept during a
> certain amount of time, either 3, 6, 12... monthes, depending on some
> external conditions. I can live with some data registered to live only 3
> monthes even if conditions eventually change to 6 months.
> I can see three options here:
>      opt. 1: indexing in a secondary table using salted timestamp as a
> key (this is not a problem in my case)
>      opt. 2: creating different tables like
> 'to-be-destroyed-in-august-2012', 'to be destroyed-in-june-2012'... and
> then merely killing them with a cron job
>      opt. 3: creating tables like 'to-be-destroyed-in-3-monthes' (with a
> 3 monthes TTL), 'to-be-destroyed-in-6-monthes' (with a 6 monthes TTL)...
> What do you think is the most efficient ?
> opt1. overloads a little bit more my already write intensive context
> opt2. looks nice (regarding deletion), but to read, I need to scan at
> least 12 different tables, and each month, my data will be buffered
> during table creation (and region splitting ! which I still don't really
> know how to choose split keys)
> opt3. looks the nicest (only 3-4 tables to scan when reading), but won't
> my daily major compact become crazy ?
> It would be great having some clue before doing the job :-) !
> Best regards,
> Frédéric.

View raw message