Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
Message-ID: <994445.13104.qm@web65504.mail.ac4.yahoo.com>
Date: Wed, 15 Sep 2010 17:29:23 -0700 (PDT)
From: Andrew Purtell <apurtell@apache.org>
Reply-To: apurtell@apache.org
Subject: Re: hbase doesn't delete data older than TTL in old regions
To: user@hbase.apache.org
In-Reply-To: <AANLkTi=hAtJh4mx0Prc=NKCmhuxiZwmedydaChRA-heg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

Yeah, indeed the TTL feature is not broken. It works as "advertised" if you=
 understand how HBase internals work. =0A=0ABut we can accommodate the expe=
ctations communicated on this thread, it sounds reasonable.=0A=0A    - Andy=
=0A=0A=0A--- On Wed, 9/15/10, Ryan Rawson <ryanobjc@gmail.com> wrote:=0A=0A=
> From: Ryan Rawson <ryanobjc@gmail.com>=0A> Subject: Re: hbase doesn't del=
ete data older than TTL in old regions=0A> To: user@hbase.apache.org=0A> Da=
te: Wednesday, September 15, 2010, 11:43 AM=0A> I feel the need to pipe in =
here,=0A> since people are accusing hbase of=0A> having a broken feature 'T=
TL' when from the description in=0A> this email=0A> thread, and my own know=
ledge doesn't really describe a=0A> broken feature.=0A>  Non optimal maybe,=
 but not broken.=0A> =0A> First off, the TTL feature works on the timestamp=
, thus=0A> rowkey=0A> structure is not related.=A0 This is because the=0A> =
timestamp is stored in=0A> a different field.=A0 If you are also storing th=
e data=0A> in row key=0A> chronological order, then you may end up with spa=
rse or=0A> 'small'=0A> regions.=A0 But that doesn't mean the feature is bro=
ken=0A> - ie: it does=0A> not remove data older than the TTL.=A0 Needs tuni=
ng yes,=0A> but not broken.=0A> =0A> Also note that "client side deletes" w=
ork in the same way=0A> that TTL=0A> does, you insert a tombstone marker, t=
hen a compaction=0A> actually purges=0A> the data itself.=0A> =0A> -ryan=0A=
> =0A> On Wed, Sep 15, 2010 at 11:26 AM, Jinsong Hu <jinsong_hu@hotmail.com=
>=0A> wrote:=0A> > I opened a ticket https://issues.apache.org/jira/browse/=
HBASE-2999 to=0A> track=0A> > issue. dropping old store , and update the ad=
jacent=0A> region's key range when=0A> > all=0A> > store for a region is go=
ne is probably the cheapest=0A> solution, both in terms=0A> > of coding and=
 in terms of resource usage in the=0A> cluster. Do we know when=0A> > this =
can be done ?=0A> >=0A> >=0A> > Jimmy.=0A> >=0A> > ------------------------=
--------------------------=0A> > From: "Jonathan Gray" <jgray@facebook.com>=
=0A> > Sent: Wednesday, September 15, 2010 11:06 AM=0A> > To: <user@hbase.a=
pache.org>=0A> > Subject: RE: hbase doesn't delete data older than TTL=0A> =
in old regions=0A> >=0A> >> This sounds reasonable.=0A> >>=0A> >> We are tr=
acking min/max timestamps in storefiles=0A> too, so it's possible=0A> >> th=
at we could expire some files of a region as=0A> well, even if the region w=
as=0A> >> not completely expired.=0A> >>=0A> >> Jinsong, mind filing a jira=
?=0A> >>=0A> >> JG=0A> >>=0A> >>> -----Original Message-----=0A> >>> From: =
Jinsong Hu [mailto:jinsong_hu@hotmail.com]=0A> >>> Sent: Wednesday, Septemb=
er 15, 2010 10:39 AM=0A> >>> To: user@hbase.apache.org=0A> >>> Subject: Re:=
 hbase doesn't delete data older=0A> than TTL in old regions=0A> >>>=0A> >>=
> Yes, Current TTL based on compaction is=0A> working as advertised if the=
=0A> >>> key=0A> >>> randomly distribute the incoming data=0A> >>> among al=
l regions. =A0However, if the key is=0A> designed in chronological=0A> >>> =
order,=0A> >>> the TTL doesn't really work, as =A0no=0A> compaction=0A> >>>=
 will happen for data already written. So we=0A> can't say =A0that current =
TTL=0A> >>> really work as advertised, as it is key=0A> structure dependent=
.=0A> >>>=0A> >>> This is a pity, because a major use case for=0A> hbase is=
 for people to=0A> >>> store=0A> >>> history or log data. normally people o=
nly=0A> >>> want to retain the data for a fixed period.=0A> for example, US=
 government=0A> >>> default data retention policy is 7 years.=0A> Those=0A>=
 >>> data are saved in chronological order. Current=0A> TTL implementation=
=0A> >>> doesn't=0A> >>> work at all for those kind of use case.=0A> >>>=0A=
> >>> In order for that use case to really work,=0A> hbase needs to have an=
=0A> >>> active=0A> >>> thread that periodically runs and check if=0A> ther=
e=0A> >>> are data older than TTL, and delete the data=0A> older than TTL i=
s=0A> >>> necessary,=0A> >>> and compact small regions older than certain=
=0A> time period=0A> >>> into larger ones to save system resource. It=0A> c=
an optimize the deletion=0A> >>> by=0A> >>> delete the whole region if it d=
etects that the=0A> last time=0A> >>> stamp for the region is older than TT=
L.=0A> =A0There should be 2 parameters=0A> >>> to=0A> >>> configure for hba=
se:=0A> >>>=0A> >>> 1. whether to disable/enable the TTL thread.=0A> >>> 2.=
 the interval that TTL will run. maybe we=0A> can use a special value=0A> >=
>> like 0=0A> >>> to indicate that we don't run the TTL thread,=0A> thus sa=
ving one=0A> >>> configuration=0A> >>> parameter.=0A> >>> for the default T=
TL, probably it should be set=0A> to 1 day.=0A> >>> 3. How small will the r=
egion be merged. it=0A> should be a percentage of=0A> >>> the=0A> >>> store=
 size. for example, if 2 consecutive=0A> region is only 10% of the=0A> >>> =
store=0A> >>> szie ( default is 256M), we can initiate a=0A> region merge. =
=A0We probably=0A> >>> need a=0A> >>> parameter to reduce the merge too. fo=
r example=0A> , we only merge for=0A> >>> regions=0A> >>> who's largest tim=
estamp=0A> >>> is older than half of TTL.=0A> >>>=0A> >>>=0A> >>> Jimmy=0A>=
 >>>=0A> >>>=0A> --------------------------------------------------=0A> >>>=
 From: "Stack" <stack@duboce.net>=0A> >>> Sent: Wednesday, September 15, 20=
10 10:08 AM=0A> >>> To: <user@hbase.apache.org>=0A> >>> Subject: Re: hbase =
doesn't delete data older=0A> than TTL in old regions=0A> >>>=0A> >>> > On =
Wed, Sep 15, 2010 at 9:54 AM, Jinsong=0A> Hu <jinsong_hu@hotmail.com>=0A> >=
>> > wrote:=0A> >>> >> I have tested the TTL for hbase and=0A> found that i=
t relies on=0A> >>> compaction to=0A> >>> >> remove old data . However, if =
a=0A> region has data that is older=0A> >>> >> than TTL, and there is no tr=
igger to=0A> compact it, then the data will=0A> >>> >> remain=0A> >>> >> th=
ere forever, wasting disk space and=0A> memory.=0A> >>> >>=0A> >>> >=0A> >>=
> > So its working as advertised then?=0A> >>> >=0A> >>> > There's currentl=
y an issue where we can=0A> skip major compactions if=0A> >>> your=0A> >>> =
> write loading has a particular character:=0A> hbase-2990.=0A> >>> >=0A> >=
>> >=0A> >>> >> It appears at this state, to really=0A> remove data older t=
han TTL we=0A> >>> need to=0A> >>> >> start a client side deletion=0A> requ=
est.=0A> >>> >=0A> >>> > Or run a manual major compaction:=0A> >>> >=0A> >>=
> > $ echo "major_compact TABLENAME" |=0A> ./bin/hbase shell=0A> >>> >=0A> =
>>> >=0A> >>> >=0A> >>> > This is really a pity because=0A> >>> >> it is an=
 more expensive way to get=0A> the job done. =A0Another side=0A> >>> effect=
 of=0A> >>> >> this is that as time goes on, we will=0A> end up with some s=
mall=0A> >>> >> regions if the data are saved in=0A> chronological order in=
 regions. It=0A> >>> >> appears=0A> >>> >> that hbase doesn't have a mechan=
ism=0A> to merge 2 consecutive=0A> >>> >> small regions into a bigger one a=
t=0A> this time.=0A> >>> >=0A> >>> > $ ./bin/hbase=0A> org.apache.hadoop.hb=
ase.util.Merge=0A> >>> > Usage: bin/hbase merge <table-name>=0A> <region-1>=
 <region-2>=0A> >>> >=0A> >>> > Currently only works on offlined table=0A> =
but there's a patch available=0A> >>> > to make it run against onlined regi=
ons.=0A> >>> >=0A> >>> >=0A> >>> > So if data is saved in=0A> >>> >> chrono=
logical order, sooner or later=0A> we will run out of capacity ,=0A> >>> ev=
en=0A> >>> >> if=0A> >>> >> the amount of data in hbase is small,=0A> becau=
se we have lots of=0A> >>> regions=0A> >>> >> with=0A> >>> >> small storage=
 space.=0A> >>> >>=0A> >>> >> A much cheaper way to remove data=0A> older t=
han TTL would be to=0A> >>> remember the=0A> >>> >> latest timestamp for th=
e region in=0A> the .META. table=0A> >>> >> and if the time is older than T=
TL, we=0A> just adjust the row in .META.=0A> >>> and=0A> >>> >> delete the =
store , without doing any=0A> compaction.=0A> >>> >>=0A> >>> >=0A> >>> > Sa=
y more on the above. =A0It sounds=0A> promising. =A0Are you suggesting that=
=0A> >>> > in addition to compactions that we also=0A> have a provision whe=
re we=0A> >>> keep=0A> >>> > account of a storefiles latest timestamp=0A> (=
we already do this I=0A> >>> > believe) and that when now -=0A> storefile-t=
imestamp > ttl, we just=0A> >>> remove=0A> >>> > the storefile wholesale. =
=A0That sounds=0A> like it could work, if that is=0A> >>> > what you are su=
ggesting. =A0Mind filing an=0A> issue w/ a detailed=0A> >>> > description?=
=0A> >>> >=0A> >>> > Thanks,=0A> >>> > St.Ack=0A> >>> >=0A> >>> >=0A> >>> >=
=0A> >>> >> Can this be added to the hbase=0A> requirement for future relea=
se ?=0A> >>> >>=0A> >>> >> Jimmy=0A> >>> >>=0A> >>> >>=0A> >>> >>=0A> >>> >=
=0A> >>=0A> >=0A> =0A=0A=0A