hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jon Schutz <jon.sch...@youramigo.com>
Subject Re: TTL, Versions and storing long history
Date Tue, 30 Jun 2009 06:05:34 GMT
Thanks Jonathan, that advice is helpful.

I've seen 0.20 mentioned a few times on the list - is this a reference
to current SVN HEAD, and if so is it considered sufficiently stable to
be deployable?


Jon Schutz 			My tech notes http://notes.jschutz.net
Chief Technology Officer 	http://www.youramigo.com

Jonathan Gray wrote:
> Jon,
> Prior to 0.20, I would definitely recommend moving the time component to
> the keys, columns, and values.  Even after 0.20, I recommend doing that
> if you want complete control.  My personal philosophy is that versions
> are for versioning, and if you are really using them as a time dimension
> of individual data points, you should consider not using versions.
> However, the API and server-side implementation for versions is greatly
> improved.  You can specify stamps manually and you can query for any
> range you want, gets and scans.
> There is not currently a way to keep versions < x weeks old but always
> keep the latest version.  If you wanted to enforce something like that,
> you could always write a MapReduce job that ran periodically and
> enforced what you wanted.
> If you want to keep history forever, the idea is to use the "big enough"
> values.  In practice, only since HBase 0.20 have we been able to handle
> millions of versions of a single column (Integer.MAX_VALUE is >2
> billion, far beyond the capabilities of HBase).  The same goes for
> TTL... 2 billion seconds is over 60 years.  Could also move everything
> to Long which would ensure there would never be an issue.  Will dig more
> and let you know.
> In any case, you'll need 0.20 to fully take advantage of versions.
> Hope that helps.
> JG
> Jon Schutz wrote:
>> How do TTL and Versions specifications interact?  I'm guessing that the
>> first limit reached applies, i.e. if TTL is 1 week and versions is 3,
>> adding a fourth update to a data record would cause the first to be
>> bumped even if it is less than a week old?  And if I only have 2
>> versions but one is 2 weeks old, the expired one gets bumped even though
>> the versions limit has not been reached?
>> Is there a way to say "Keep versions < x weeks old, but always keep at
>> least the latest version, no matter how old?"
>> Suppose I want to keep the history about a particular object forever.
>> Looks like TTL can be set to 'Forever' (-1) but Versions has no
>> 'infinite' setting - I guess that's OK as in practice MAXINT is "big
>> enough".  Would it be wise to use Hbase like this to maintain a history,
>> or should I be adding a time component into the key and storing multiple
>> records?  Can anyone help outline the pros and cons?
>> Thanks,

View raw message