hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wilm Schumacher <wilm.schumac...@cawoom.com>
Subject Re: A use case for ttl deletion?
Date Fri, 26 Sep 2014 16:10:36 GMT
Hi,

your mail got me thinking about a general answer.

I think a good answer would be: all data that are only usefull for a
specific time AND are possibly generated infinitely for a finite number
of users should have a ttl. OR when the space is very small compared to
the number of users.

An example are e.g. cookies. A single user generates a handfull of
cookie events per day. Let's just look at the generation of a session.
Perhaps once a day. So for a number of finite users and finite number of
data per user the number of cookies would grow and grow by day. Without
any usefull purpose (under the assumption that you use such a cookie
system with a session that expires).

Another example would be password reset attempts or something like that
in a web app. This events should expire after a number of days and
should be deleted after a longer time (to say that the attempt is "out
of date" or something like that there should be 2 different "expiration
times"). Without that the password reset attempts would be just old junk
in your db. Or you would have to make MR jobs to clean the db on a
regular basis.

An example could also be a aggregation service, where a user can make a
list of things to be saved that are generated elsewhere (e.g. news
headlines). A finite number of users would generate infinite number of
rows just by waiting. So you could make policy where only the last 30
days are aggregated. And this could be implemented by a ttl.

A further example would be a mechanism to prevent brute force attacks
where you save the last attempts, and if a user has more than N attempts
in M seconds the attempt fails. This could be implemented by a column
family "attempts", where the last attempts are saved. If it's larger
than N => fail. And when you set the TTL to M seconds, you are ready to go.

An example for the second use case (finite space for large number of
users) would be a service that serves files for fast and easy sharing
between the users. Paid by ads. Thus you have a large user base, but
very small space. An example would be "one click hosting" or something
like that, where the users use the files perhaps a week, and the forget
anything about it. So in your policy there could be something like
"expire after 30 days after last use" which you can implement just by a
ttl and without MR jobs.

All this example come from the usage of hbase for the implementation of
user driven systems. Web apps or something like that. However, it should
be easy to find examples for more general applications of hbase. I once
read a question from a hbase user, which had the problem that the
logging (which was saved in the hbase) went to large, and he only wants
to save the last N days and asked for help for implemeneting a MR job
which regularly kicks older logging messages. A ttl and he was good to
go ;).

Hope this helped.

Best wishes

Wilm

Am 26.09.2014 um 17:20 schrieb yonghu:
> Hello,
> 
> Can anyone give me a concrete use case for ttl deletions? I mean in which
> situation we should set ttl property?
> 
> regards!
> 
> Yong
> 

Mime
View raw message