hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: A use case for ttl deletion?
Date Tue, 30 Sep 2014 13:15:43 GMT

OP wants to know good use cases where to use ttl setting. 

Answer: Any situation where the cost of retaining the data exceeds the value to be gained
from the data.  Using ttl allows for automatic purging of data. 

Answer2: Any situation where you have to enforce specific retention policies for compliance
reasons. As an example, not retaining client or customer access information longer than 12
months.  
(I can’t give a specific, but there are EU data retention laws which limit the length you
can retain the data.)  Again here, you want to be able to show that there is an automated
method for removing aged data to ensure compliance. 


When you start to get in to the IoT, a lot of data is generated and the potential value from
the data can easily exceed the cost of storage.  
While there is some value in capturing telemetry from your android phone to show the path
you take from your desk down to the local starbucks and which local starbucks you go to, 3
years from now, that raw data has very little value. So it would make sense to purge it. 

 

On Sep 26, 2014, at 11:21 AM, Ted Yu <yuzhihong@gmail.com> wrote:

> This is a good writeup that should probably go to refguide.
> 
> bq. example would be password reset attempts
> 
> In some systems such information would have long retention period (maybe to
> conform to certain regulation).
> 
> Cheers
> 
> On Fri, Sep 26, 2014 at 9:10 AM, Wilm Schumacher <wilm.schumacher@cawoom.com
>> wrote:
> 
>> Hi,
>> 
>> your mail got me thinking about a general answer.
>> 
>> I think a good answer would be: all data that are only usefull for a
>> specific time AND are possibly generated infinitely for a finite number
>> of users should have a ttl. OR when the space is very small compared to
>> the number of users.
>> 
>> An example are e.g. cookies. A single user generates a handfull of
>> cookie events per day. Let's just look at the generation of a session.
>> Perhaps once a day. So for a number of finite users and finite number of
>> data per user the number of cookies would grow and grow by day. Without
>> any usefull purpose (under the assumption that you use such a cookie
>> system with a session that expires).
>> 
>> Another example would be password reset attempts or something like that
>> in a web app. This events should expire after a number of days and
>> should be deleted after a longer time (to say that the attempt is "out
>> of date" or something like that there should be 2 different "expiration
>> times"). Without that the password reset attempts would be just old junk
>> in your db. Or you would have to make MR jobs to clean the db on a
>> regular basis.
>> 
>> An example could also be a aggregation service, where a user can make a
>> list of things to be saved that are generated elsewhere (e.g. news
>> headlines). A finite number of users would generate infinite number of
>> rows just by waiting. So you could make policy where only the last 30
>> days are aggregated. And this could be implemented by a ttl.
>> 
>> A further example would be a mechanism to prevent brute force attacks
>> where you save the last attempts, and if a user has more than N attempts
>> in M seconds the attempt fails. This could be implemented by a column
>> family "attempts", where the last attempts are saved. If it's larger
>> than N => fail. And when you set the TTL to M seconds, you are ready to go.
>> 
>> An example for the second use case (finite space for large number of
>> users) would be a service that serves files for fast and easy sharing
>> between the users. Paid by ads. Thus you have a large user base, but
>> very small space. An example would be "one click hosting" or something
>> like that, where the users use the files perhaps a week, and the forget
>> anything about it. So in your policy there could be something like
>> "expire after 30 days after last use" which you can implement just by a
>> ttl and without MR jobs.
>> 
>> All this example come from the usage of hbase for the implementation of
>> user driven systems. Web apps or something like that. However, it should
>> be easy to find examples for more general applications of hbase. I once
>> read a question from a hbase user, which had the problem that the
>> logging (which was saved in the hbase) went to large, and he only wants
>> to save the last N days and asked for help for implemeneting a MR job
>> which regularly kicks older logging messages. A ttl and he was good to
>> go ;).
>> 
>> Hope this helped.
>> 
>> Best wishes
>> 
>> Wilm
>> 
>> Am 26.09.2014 um 17:20 schrieb yonghu:
>>> Hello,
>>> 
>>> Can anyone give me a concrete use case for ttl deletions? I mean in which
>>> situation we should set ttl property?
>>> 
>>> regards!
>>> 
>>> Yong
>>> 
>> 


Mime
View raw message