hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Varley <ivar...@salesforce.com>
Subject Re: TTL for cell values
Date Sun, 14 Aug 2011 10:51:05 GMT
> "I don't think anyone is well served by that kind of shallow analysis."


You're right, Andy; sorry if it came off sounding flip. My point was simply that the idea
of a persistent data store with a configuration setting that makes the most current version
of your data disappear without an explicit delete is very counter-intuitve for traditional
database folks like me. Durability is the first, most inviolate rule, and this setting subverts
it in a way that is (at least for me) not obvious at first, and differs drastically from the
max versions setting. Maybe my confusion was due to the fact that I was looking for specific
behavior (HBASE-4071, essentially). I totally see your point, though; putting it the way I
did makes for a rather alarming pull quote. :(

I'm not at all suggesting we should alter the existing behavior (as if that were even possible
at this point); this is a useful setting for data that's basically just a cache. But this
is an area where the road from RDBMS to HBase might be a little bumpy for folks, and adding
a new option would also have the advantage of making it even more clear what TTL is for. 

Ian


On Aug 13, 2011, at 11:28 PM, "Andrew Purtell" <apurtell@apache.org> wrote:

>>  When I was talking to someone the other day about the current TTL policy, he was
like "WTF, who would want that, it eats your data?"
>  
> I don't think anyone is well served by that kind of shallow analysis. 
> 
> The TTL feature was introduced for the convenience of having the system automatically
garbage collect transient data. If you set a TTL on a column family, you are telling the system
that the data shall expire after that interval elapses, that the data is only useful for the
configured time period. If the data should not actually be considered transient, then configuring
a TTL is the wrong thing to do -- at least currently.
> 
>>  "TTL except for most recent"
> 
> HBASE-4071 is a useful and good idea.
> 
> Best regards,
> 
> 
> - Andy
> 
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
> 
> 
>> ________________________________
>> From: Ian Varley <ivarley@salesforce.com>
>> To: "user@hbase.apache.org" <user@hbase.apache.org>
>> Sent: Saturday, August 13, 2011 8:24 PM
>> Subject: Re: TTL for cell values
>> 
>> So, what you're saying is:
>> 
>> http://lmgtfy.com/?q=hbase+ttl+remove+all+versions+except+most+recent
>> 
>> :)
>> 
>> I like the idea of making this pluggable (via the coprocessor framework, or otherwise).
But I also think this is a fundamental enough policy option that making it hard-coded might
be a good idea. When I was talking to someone the other day about the current TTL policy,
he was like, "WTF, who would want that, it eats your data?". There's no such thing as a "keep
0 versions" option, and thus no way to accidentally lose your most current data using that
approach. But with the TTL version there is, which is (IMO) counter-intuitive for those coming
from an RDBMS background.
>> 
>> Commented thusly in the JIRA. :)
>> 
>> Ian
>> 
>> On Aug 13, 2011, at 8:00 PM, lars hofhansl wrote:
>> 
>> Hey Ian, (how are things :)
>> 
>> I just stumbled across https://issues.apache.org/jira/browse/HBASE-4071.
>> 
>> -- Lars
>> 
>> 
>> ________________________________
>> From: Ian Varley <ivarley@salesforce.com<mailto:ivarley@salesforce.com>>
>> To: "user@hbase.apache.org<mailto:user@hbase.apache.org>" <user@hbase.apache.org<mailto:user@hbase.apache.org>>
>> Sent: Saturday, August 13, 2011 6:51 PM
>> Subject: TTL for cell values
>> 
>> Hi all,
>> 
>> Quick clarification on TTL for cells. The concept makes sense (instead of "keep 3
versions" you say "keep versions more recent than time T"). But, if there's only 1 value in
the cell, and that value is older than the TTL, will it also be deleted?
>> 
>> If so, has there ever been discussion of a "TTL except for most recent" option? (i.e.
you want the current version to be permanently persistent, but also want some time-based range
of version history, so you can peek back and get consistent snapshots within the last hour,
6 hours, 24 hours, etc). TTL seems perfect for this, but not if it'll chomp the current version
of cells too! :)
>> 
>> Thanks!
>> Ian
>> 
>> 
>> 
>> 

Mime
View raw message