Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 49205 invoked from network); 16 Sep 2010 00:29:57 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 16 Sep 2010 00:29:57 -0000 Received: (qmail 95084 invoked by uid 500); 16 Sep 2010 00:29:56 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 94961 invoked by uid 500); 16 Sep 2010 00:29:55 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 94953 invoked by uid 99); 16 Sep 2010 00:29:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Sep 2010 00:29:55 +0000 X-ASF-Spam-Status: No, hits=2.7 required=10.0 tests=RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.6.228.105] (HELO n12-vm0.bullet.mail.ac4.yahoo.com) (74.6.228.105) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 16 Sep 2010 00:29:48 +0000 Received: from [76.13.12.65] by n12.bullet.mail.ac4.yahoo.com with NNFMP; 16 Sep 2010 00:29:27 -0000 Received: from [76.13.10.168] by t6.bullet.mail.ac4.yahoo.com with NNFMP; 16 Sep 2010 00:29:27 -0000 Received: from [127.0.0.1] by omp109.mail.ac4.yahoo.com with NNFMP; 16 Sep 2010 00:29:27 -0000 X-Yahoo-Newman-Property: ymail-5 X-Yahoo-Newman-Id: 362106.880.bm@omp109.mail.ac4.yahoo.com Received: (qmail 14405 invoked by uid 60001); 16 Sep 2010 00:29:25 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1284596965; bh=W9gbZDZ6G1YsNdH0Eo5KONhlN0oIP3jS9bpjhaPf308=; h=Message-ID:X-YMail-OSG:Received:X-RocketYMMF:X-Mailer:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=sEb0IAG461GjcqSMra/ntoE6oBwr4HXlJPSeg4e51DFhuSUwtvE5fa2mFMbK0ojKmZ6T0B2i/D06e3Z4h3epGhTuLfzT+3feUWhaOtjAzu6/mEfmGMLB/atMOlV/6dL1j5iKi3P5d7JzJ9DFcHjFsACsaeeAU1mWuW2SA6+cV04= Message-ID: <994445.13104.qm@web65504.mail.ac4.yahoo.com> X-YMail-OSG: Kb3lndAVM1lBCsVwN_aegtsuJd2TNzAzLFjGc4wG2Rn7ZeR wWJWr9Ipv83qiWPleB0wtcWbMCQ8Vy.CjyCsV0IqBvxQ_pvFFZuHL7cQEtHD lUNJDSb75.efntnCmKaq11F9OPnLW8iJfGZhavqJiACBtsw3UHyQ.MAa76Oi In9L_CO3Rza_ZY_sE5NLVvGn5ROGEgbVUv0I1qbn0yfuYs58sjdXGMhIARlk pTFxju1_rlHYu0mNlkZmE_68wY41JsIzKwlBTrVwXxCbUiDFej3xOxNjAQtx EIQMl8E_enZI1Qlz3y52QCNrO2NNpbeMWOHoy7lZoOnBLGB6kBo1ckahqtEk - Received: from [203.116.67.4] by web65504.mail.ac4.yahoo.com via HTTP; Wed, 15 Sep 2010 17:29:23 PDT X-RocketYMMF: apurtell X-Mailer: YahooMailClassic/11.4.7 YahooMailWebService/0.8.105.279950 Date: Wed, 15 Sep 2010 17:29:23 -0700 (PDT) From: Andrew Purtell Reply-To: apurtell@apache.org Subject: Re: hbase doesn't delete data older than TTL in old regions To: user@hbase.apache.org In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Yeah, indeed the TTL feature is not broken. It works as "advertised" if you= understand how HBase internals work. =0A=0ABut we can accommodate the expe= ctations communicated on this thread, it sounds reasonable.=0A=0A - Andy= =0A=0A=0A--- On Wed, 9/15/10, Ryan Rawson wrote:=0A=0A= > From: Ryan Rawson =0A> Subject: Re: hbase doesn't del= ete data older than TTL in old regions=0A> To: user@hbase.apache.org=0A> Da= te: Wednesday, September 15, 2010, 11:43 AM=0A> I feel the need to pipe in = here,=0A> since people are accusing hbase of=0A> having a broken feature 'T= TL' when from the description in=0A> this email=0A> thread, and my own know= ledge doesn't really describe a=0A> broken feature.=0A> Non optimal maybe,= but not broken.=0A> =0A> First off, the TTL feature works on the timestamp= , thus=0A> rowkey=0A> structure is not related.=A0 This is because the=0A> = timestamp is stored in=0A> a different field.=A0 If you are also storing th= e data=0A> in row key=0A> chronological order, then you may end up with spa= rse or=0A> 'small'=0A> regions.=A0 But that doesn't mean the feature is bro= ken=0A> - ie: it does=0A> not remove data older than the TTL.=A0 Needs tuni= ng yes,=0A> but not broken.=0A> =0A> Also note that "client side deletes" w= ork in the same way=0A> that TTL=0A> does, you insert a tombstone marker, t= hen a compaction=0A> actually purges=0A> the data itself.=0A> =0A> -ryan=0A= > =0A> On Wed, Sep 15, 2010 at 11:26 AM, Jinsong Hu =0A> wrote:=0A> > I opened a ticket https://issues.apache.org/jira/browse/= HBASE-2999 to=0A> track=0A> > issue. dropping old store , and update the ad= jacent=0A> region's key range when=0A> > all=0A> > store for a region is go= ne is probably the cheapest=0A> solution, both in terms=0A> > of coding and= in terms of resource usage in the=0A> cluster. Do we know when=0A> > this = can be done ?=0A> >=0A> >=0A> > Jimmy.=0A> >=0A> > ------------------------= --------------------------=0A> > From: "Jonathan Gray" = =0A> > Sent: Wednesday, September 15, 2010 11:06 AM=0A> > To: =0A> > Subject: RE: hbase doesn't delete data older than TTL=0A> = in old regions=0A> >=0A> >> This sounds reasonable.=0A> >>=0A> >> We are tr= acking min/max timestamps in storefiles=0A> too, so it's possible=0A> >> th= at we could expire some files of a region as=0A> well, even if the region w= as=0A> >> not completely expired.=0A> >>=0A> >> Jinsong, mind filing a jira= ?=0A> >>=0A> >> JG=0A> >>=0A> >>> -----Original Message-----=0A> >>> From: = Jinsong Hu [mailto:jinsong_hu@hotmail.com]=0A> >>> Sent: Wednesday, Septemb= er 15, 2010 10:39 AM=0A> >>> To: user@hbase.apache.org=0A> >>> Subject: Re:= hbase doesn't delete data older=0A> than TTL in old regions=0A> >>>=0A> >>= > Yes, Current TTL based on compaction is=0A> working as advertised if the= =0A> >>> key=0A> >>> randomly distribute the incoming data=0A> >>> among al= l regions. =A0However, if the key is=0A> designed in chronological=0A> >>> = order,=0A> >>> the TTL doesn't really work, as =A0no=0A> compaction=0A> >>>= will happen for data already written. So we=0A> can't say =A0that current = TTL=0A> >>> really work as advertised, as it is key=0A> structure dependent= .=0A> >>>=0A> >>> This is a pity, because a major use case for=0A> hbase is= for people to=0A> >>> store=0A> >>> history or log data. normally people o= nly=0A> >>> want to retain the data for a fixed period.=0A> for example, US= government=0A> >>> default data retention policy is 7 years.=0A> Those=0A>= >>> data are saved in chronological order. Current=0A> TTL implementation= =0A> >>> doesn't=0A> >>> work at all for those kind of use case.=0A> >>>=0A= > >>> In order for that use case to really work,=0A> hbase needs to have an= =0A> >>> active=0A> >>> thread that periodically runs and check if=0A> ther= e=0A> >>> are data older than TTL, and delete the data=0A> older than TTL i= s=0A> >>> necessary,=0A> >>> and compact small regions older than certain= =0A> time period=0A> >>> into larger ones to save system resource. It=0A> c= an optimize the deletion=0A> >>> by=0A> >>> delete the whole region if it d= etects that the=0A> last time=0A> >>> stamp for the region is older than TT= L.=0A> =A0There should be 2 parameters=0A> >>> to=0A> >>> configure for hba= se:=0A> >>>=0A> >>> 1. whether to disable/enable the TTL thread.=0A> >>> 2.= the interval that TTL will run. maybe we=0A> can use a special value=0A> >= >> like 0=0A> >>> to indicate that we don't run the TTL thread,=0A> thus sa= ving one=0A> >>> configuration=0A> >>> parameter.=0A> >>> for the default T= TL, probably it should be set=0A> to 1 day.=0A> >>> 3. How small will the r= egion be merged. it=0A> should be a percentage of=0A> >>> the=0A> >>> store= size. for example, if 2 consecutive=0A> region is only 10% of the=0A> >>> = store=0A> >>> szie ( default is 256M), we can initiate a=0A> region merge. = =A0We probably=0A> >>> need a=0A> >>> parameter to reduce the merge too. fo= r example=0A> , we only merge for=0A> >>> regions=0A> >>> who's largest tim= estamp=0A> >>> is older than half of TTL.=0A> >>>=0A> >>>=0A> >>> Jimmy=0A>= >>>=0A> >>>=0A> --------------------------------------------------=0A> >>>= From: "Stack" =0A> >>> Sent: Wednesday, September 15, 20= 10 10:08 AM=0A> >>> To: =0A> >>> Subject: Re: hbase = doesn't delete data older=0A> than TTL in old regions=0A> >>>=0A> >>> > On = Wed, Sep 15, 2010 at 9:54 AM, Jinsong=0A> Hu =0A> >= >> > wrote:=0A> >>> >> I have tested the TTL for hbase and=0A> found that i= t relies on=0A> >>> compaction to=0A> >>> >> remove old data . However, if = a=0A> region has data that is older=0A> >>> >> than TTL, and there is no tr= igger to=0A> compact it, then the data will=0A> >>> >> remain=0A> >>> >> th= ere forever, wasting disk space and=0A> memory.=0A> >>> >>=0A> >>> >=0A> >>= > > So its working as advertised then?=0A> >>> >=0A> >>> > There's currentl= y an issue where we can=0A> skip major compactions if=0A> >>> your=0A> >>> = > write loading has a particular character:=0A> hbase-2990.=0A> >>> >=0A> >= >> >=0A> >>> >> It appears at this state, to really=0A> remove data older t= han TTL we=0A> >>> need to=0A> >>> >> start a client side deletion=0A> requ= est.=0A> >>> >=0A> >>> > Or run a manual major compaction:=0A> >>> >=0A> >>= > > $ echo "major_compact TABLENAME" |=0A> ./bin/hbase shell=0A> >>> >=0A> = >>> >=0A> >>> >=0A> >>> > This is really a pity because=0A> >>> >> it is an= more expensive way to get=0A> the job done. =A0Another side=0A> >>> effect= of=0A> >>> >> this is that as time goes on, we will=0A> end up with some s= mall=0A> >>> >> regions if the data are saved in=0A> chronological order in= regions. It=0A> >>> >> appears=0A> >>> >> that hbase doesn't have a mechan= ism=0A> to merge 2 consecutive=0A> >>> >> small regions into a bigger one a= t=0A> this time.=0A> >>> >=0A> >>> > $ ./bin/hbase=0A> org.apache.hadoop.hb= ase.util.Merge=0A> >>> > Usage: bin/hbase merge =0A> = =0A> >>> >=0A> >>> > Currently only works on offlined table=0A> = but there's a patch available=0A> >>> > to make it run against onlined regi= ons.=0A> >>> >=0A> >>> >=0A> >>> > So if data is saved in=0A> >>> >> chrono= logical order, sooner or later=0A> we will run out of capacity ,=0A> >>> ev= en=0A> >>> >> if=0A> >>> >> the amount of data in hbase is small,=0A> becau= se we have lots of=0A> >>> regions=0A> >>> >> with=0A> >>> >> small storage= space.=0A> >>> >>=0A> >>> >> A much cheaper way to remove data=0A> older t= han TTL would be to=0A> >>> remember the=0A> >>> >> latest timestamp for th= e region in=0A> the .META. table=0A> >>> >> and if the time is older than T= TL, we=0A> just adjust the row in .META.=0A> >>> and=0A> >>> >> delete the = store , without doing any=0A> compaction.=0A> >>> >>=0A> >>> >=0A> >>> > Sa= y more on the above. =A0It sounds=0A> promising. =A0Are you suggesting that= =0A> >>> > in addition to compactions that we also=0A> have a provision whe= re we=0A> >>> keep=0A> >>> > account of a storefiles latest timestamp=0A> (= we already do this I=0A> >>> > believe) and that when now -=0A> storefile-t= imestamp > ttl, we just=0A> >>> remove=0A> >>> > the storefile wholesale. = =A0That sounds=0A> like it could work, if that is=0A> >>> > what you are su= ggesting. =A0Mind filing an=0A> issue w/ a detailed=0A> >>> > description?= =0A> >>> >=0A> >>> > Thanks,=0A> >>> > St.Ack=0A> >>> >=0A> >>> >=0A> >>> >= =0A> >>> >> Can this be added to the hbase=0A> requirement for future relea= se ?=0A> >>> >>=0A> >>> >> Jimmy=0A> >>> >>=0A> >>> >>=0A> >>> >>=0A> >>> >= =0A> >>=0A> >=0A> =0A=0A=0A