hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: A use case for ttl deletion?
Date Fri, 26 Sep 2014 16:21:40 GMT
This is a good writeup that should probably go to refguide.

bq. example would be password reset attempts

In some systems such information would have long retention period (maybe to
conform to certain regulation).

Cheers

On Fri, Sep 26, 2014 at 9:10 AM, Wilm Schumacher <wilm.schumacher@cawoom.com
> wrote:

> Hi,
>
> your mail got me thinking about a general answer.
>
> I think a good answer would be: all data that are only usefull for a
> specific time AND are possibly generated infinitely for a finite number
> of users should have a ttl. OR when the space is very small compared to
> the number of users.
>
> An example are e.g. cookies. A single user generates a handfull of
> cookie events per day. Let's just look at the generation of a session.
> Perhaps once a day. So for a number of finite users and finite number of
> data per user the number of cookies would grow and grow by day. Without
> any usefull purpose (under the assumption that you use such a cookie
> system with a session that expires).
>
> Another example would be password reset attempts or something like that
> in a web app. This events should expire after a number of days and
> should be deleted after a longer time (to say that the attempt is "out
> of date" or something like that there should be 2 different "expiration
> times"). Without that the password reset attempts would be just old junk
> in your db. Or you would have to make MR jobs to clean the db on a
> regular basis.
>
> An example could also be a aggregation service, where a user can make a
> list of things to be saved that are generated elsewhere (e.g. news
> headlines). A finite number of users would generate infinite number of
> rows just by waiting. So you could make policy where only the last 30
> days are aggregated. And this could be implemented by a ttl.
>
> A further example would be a mechanism to prevent brute force attacks
> where you save the last attempts, and if a user has more than N attempts
> in M seconds the attempt fails. This could be implemented by a column
> family "attempts", where the last attempts are saved. If it's larger
> than N => fail. And when you set the TTL to M seconds, you are ready to go.
>
> An example for the second use case (finite space for large number of
> users) would be a service that serves files for fast and easy sharing
> between the users. Paid by ads. Thus you have a large user base, but
> very small space. An example would be "one click hosting" or something
> like that, where the users use the files perhaps a week, and the forget
> anything about it. So in your policy there could be something like
> "expire after 30 days after last use" which you can implement just by a
> ttl and without MR jobs.
>
> All this example come from the usage of hbase for the implementation of
> user driven systems. Web apps or something like that. However, it should
> be easy to find examples for more general applications of hbase. I once
> read a question from a hbase user, which had the problem that the
> logging (which was saved in the hbase) went to large, and he only wants
> to save the last N days and asked for help for implemeneting a MR job
> which regularly kicks older logging messages. A ttl and he was good to
> go ;).
>
> Hope this helped.
>
> Best wishes
>
> Wilm
>
> Am 26.09.2014 um 17:20 schrieb yonghu:
> > Hello,
> >
> > Can anyone give me a concrete use case for ttl deletions? I mean in which
> > situation we should set ttl property?
> >
> > regards!
> >
> > Yong
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message