Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 63364 invoked from network); 30 Jun 2009 02:47:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 30 Jun 2009 02:47:18 -0000 Received: (qmail 75268 invoked by uid 500); 30 Jun 2009 02:47:28 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 75196 invoked by uid 500); 30 Jun 2009 02:47:28 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 75186 invoked by uid 99); 30 Jun 2009 02:47:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Jun 2009 02:47:28 +0000 X-ASF-Spam-Status: No, hits=-1.0 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jlist@streamy.com designates 72.34.249.3 as permitted sender) Received: from [72.34.249.3] (HELO mail.streamy.com) (72.34.249.3) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Jun 2009 02:47:18 +0000 Received: from [192.168.249.50] (static-98-112-71-211.lsanca.dsl-w.verizon.net [98.112.71.211]) by ns1.streamy.com (8.13.1/8.13.1) with ESMTP id n5U2ku8C030644 for ; Mon, 29 Jun 2009 19:46:56 -0700 Message-ID: <4A497C8A.6020900@streamy.com> Date: Mon, 29 Jun 2009 19:46:34 -0700 From: Jonathan Gray User-Agent: Thunderbird 2.0.0.22 (Windows/20090605) MIME-Version: 1.0 To: hbase-user@hadoop.apache.org Subject: Re: TTL, Versions and storing long history References: <1246328518.15122.610.camel@christie.youramigo.net> In-Reply-To: <1246328518.15122.610.camel@christie.youramigo.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on ns1.streamy.com X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=failed version=3.2.5 Jon, Prior to 0.20, I would definitely recommend moving the time component to the keys, columns, and values. Even after 0.20, I recommend doing that if you want complete control. My personal philosophy is that versions are for versioning, and if you are really using them as a time dimension of individual data points, you should consider not using versions. However, the API and server-side implementation for versions is greatly improved. You can specify stamps manually and you can query for any range you want, gets and scans. There is not currently a way to keep versions < x weeks old but always keep the latest version. If you wanted to enforce something like that, you could always write a MapReduce job that ran periodically and enforced what you wanted. If you want to keep history forever, the idea is to use the "big enough" values. In practice, only since HBase 0.20 have we been able to handle millions of versions of a single column (Integer.MAX_VALUE is >2 billion, far beyond the capabilities of HBase). The same goes for TTL... 2 billion seconds is over 60 years. Could also move everything to Long which would ensure there would never be an issue. Will dig more and let you know. In any case, you'll need 0.20 to fully take advantage of versions. Hope that helps. JG Jon Schutz wrote: > How do TTL and Versions specifications interact? I'm guessing that the > first limit reached applies, i.e. if TTL is 1 week and versions is 3, > adding a fourth update to a data record would cause the first to be > bumped even if it is less than a week old? And if I only have 2 > versions but one is 2 weeks old, the expired one gets bumped even though > the versions limit has not been reached? > > Is there a way to say "Keep versions < x weeks old, but always keep at > least the latest version, no matter how old?" > > Suppose I want to keep the history about a particular object forever. > Looks like TTL can be set to 'Forever' (-1) but Versions has no > 'infinite' setting - I guess that's OK as in practice MAXINT is "big > enough". Would it be wise to use Hbase like this to maintain a history, > or should I be adding a time component into the key and storing multiple > records? Can anyone help outline the pros and cons? > > Thanks, > >