Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B610AC419 for ; Wed, 6 Jun 2012 09:06:42 +0000 (UTC) Received: (qmail 86727 invoked by uid 500); 6 Jun 2012 09:06:40 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 86646 invoked by uid 500); 6 Jun 2012 09:06:40 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 86634 invoked by uid 99); 6 Jun 2012 09:06:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Jun 2012 09:06:40 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_FORGED_REPLYTO,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [98.138.229.66] (HELO nm33-vm2.bullet.mail.ne1.yahoo.com) (98.138.229.66) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 06 Jun 2012 09:06:31 +0000 Received: from [98.138.90.49] by nm33.bullet.mail.ne1.yahoo.com with NNFMP; 06 Jun 2012 09:06:10 -0000 Received: from [98.138.87.12] by tm2.bullet.mail.ne1.yahoo.com with NNFMP; 06 Jun 2012 09:06:10 -0000 Received: from [127.0.0.1] by omp1012.mail.ne1.yahoo.com with NNFMP; 06 Jun 2012 09:06:10 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 491191.18560.bm@omp1012.mail.ne1.yahoo.com Received: (qmail 26886 invoked by uid 60001); 6 Jun 2012 09:06:10 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1338973570; bh=qGr/M5aGS6899yHBGbV3BlLChLIgTkJuM3WL/BtfnlM=; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=DJQm/AgoEcJ5R++iseg9OUAN5VXIs1YhFnrNn1em1hN+AVu0BpNb1Ha6qI23LD/qZ4hIuuzGzEXNs88T9m7U7RCopCcN2K4rkIQ5PgIg8z6DdC5Meare/yz0+aNTeFpZahcMheuHDQPBFA5uCxP/GlJ2d/f/MMNm2dIWQgsV1b8= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=ODpKFmXxfj1YYGjWDQpFbi927D6dwPWHGxqbZ9VA0I15Ma3riCrND/XXMCQAJ9cTaKVMPqcZS6sQ4MhyJEUjnzBSx1Y9n+3Dc7wq7IkXgNWezbo/XWi4/7K17xNkqkD/BdczhsPi18T9iLX0X6UmiGYJ55C7Jf6zaMxlej+jTX0=; X-YMail-OSG: FjHIDDQVM1mr0oELcple2HEnRH5H3V1Z_aAuwn7pPJp4rcJ beBwbbr9WkUI.sJ_8cJ_2CI9.bVB0ouXUpskoeSgf6duvSnCaeKxxCGMLdlh AzBuKKac_L9vlI7.trIMkDl9iKmjNkhc7NsdsNUpWFT_Np0iV0bVjradvuoE SszUqK..snMyiiZWQexZGyua1HoryVttg.NwZlmsv_1RR83oMZ5dMUrRHuct qxUz1Tdrbsy_37GB_eebPIkcNlGKycVxjGYVvMMYqxKhJLkiqL2MQ6sBPNSC CbJ5slIlDUJTm5_W1DPS6uPH6z.oJZdAZjdZyu_Asao.lyjkWcPe3vsBy9eU 5Txb_033xpfTv7O4jW9HbeJBIzz4deClS1WeUpw3N3bFTVzBRYhyOCFh2hf1 FhzHsaTVXNYxyN4V9yLagXwwuoQuhJ.0loFXPS9GxELjl..B_gyLuUkfaUIg W4roV_vmTwUsfLlZuRSPaPHHeQ2s- Received: from [217.252.73.193] by web121705.mail.ne1.yahoo.com via HTTP; Wed, 06 Jun 2012 02:06:10 PDT X-Mailer: YahooMailWebService/0.8.118.349524 References: <2ABF8C7192CD4F68BDDEE0004DA15A01@gmail.com> Message-ID: <1338973570.11638.YahooMailNeo@web121705.mail.ne1.yahoo.com> Date: Wed, 6 Jun 2012 02:06:10 -0700 (PDT) From: lars hofhansl Reply-To: lars hofhansl Subject: Re: When does compaction actually occur? To: "user@hbase.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Hi Tom,=0A=0AYou have set MIN_VERSIONS to 1. That tells HBase that for this= column family you want to keep=0Aat least 1 version of a cell around regar= dless of whether it expired (due to TTL) or not.=0AI think if you remove th= at it will behave as you expect.=0A=0A=0AAs a general rule a compaction wil= l never influence visibility of data that was inserted before the compactio= n (except for RAW scans), and hence you should never need to ask when a com= paction happens - unless you are running out of disk space.=0A=0A=0A-- Lars= =0A________________________________=0AFrom: Tom Brown =0ATo: user@hbase.apache.org =0ASent: Tuesday, June 5, 2012 2:37 PM=0ASubj= ect: Re: When does compaction actually occur?=0A=0ALars,=0A=0AIn response t= o your earlier email, I'm not completely sure whether or=0Anot I'm using a = raw scan. The scan is performed in a region server=0Acoprocessor initialize= d as such:=0A=0A=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 Scan scan =3D new Sca= n()=0A=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 .setMaxVersi= ons(1)=0A=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 .setTimeR= ange(myMinTimeStamp, myMaxTimeStamp)=0A=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2= =A0 =C2=A0=C2=A0=C2=A0 .setStartRow(myStartRow)=0A=C2=A0=C2=A0=C2=A0 =C2=A0= =C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 .setStopRow(myStopRow);=0A=C2=A0=C2=A0=C2= =A0 =C2=A0=C2=A0=C2=A0 scan.setCaching(1000);=0A=0A=C2=A0=C2=A0=C2=A0 =C2= =A0=C2=A0=C2=A0 InternalScanner scanner =3D ((RegionCoprocessorEnvironment)= getEnvironment())=0A=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2= =A0 =C2=A0=C2=A0=C2=A0 .getRegion().getScanner(scan);=0A=0AThe scan is inde= ed being filtered to the range I provide (using=0AsetTimeRange), but it wil= l retrieve records much older than should be=0Aallowed given the TTL.=0A=0A= I have multiple tables setup in a similar fashion, but here a=0Adescription= of one of them:=0A=0A{NAME =3D> 'facts', FAMILIES =3D> [{NAME =3D> 'd', BL= OOMFILTER =3D> 'ROW',=0ACOMPRESSION =3D> 'SNAPPY', VERSIONS =3D> '1', TTL = =3D> '3600', MIN_VERSIONS=0A=3D> '1'}]}=0A=0A=0AI'm building an OLAP cube f= or this project and want to make sure the=0Adata size doesn't grow through = the roof. Whether or not data expires=0Aafter exactly one hour is not an ab= solute requirement for this use=0Acase. But I want to know why the system i= s not behaving as I think I=0Aconfigured it to behave.=0A=0AThanks!=0A=0A--= Tom=0A=0AOn Sun, Jun 3, 2012 at 2:57 AM, Lars George wrote:=0A> What Amandeep says and also keep in mind that with the current= selection process HBase holds O(log N) files for N data. So say for 2GB re= gion sizes you get 2-3 files. This means it very "aggressively" is compacti= ng files, and most of these are "all files included" once... which are the = promoted to major compactions implicitly. That way your predicate deletes s= hould be in effect and you will only need scheduled major compactions only = ever so often.=0A>=0A> Lars=0A>=0A> On Jun 2, 2012, at 1:04 AM, Amandeep Kh= urana wrote:=0A>=0A>> Tom,=0A>>=0A>> Old cells will get deleted as a part o= f the next major compaction, which is typically recommended to be done once= a day, when the load on the system is at its lowest.=0A>>=0A>> FWIW=E2=80= =A6 To have a TTL of 3600 take effect, you'll have to do a major compaction= every hour, which is an expensive operation specially at scale. Chances ar= e that your I/O loads will shoot up and latencies will spike for operations= to HBase. Can you tell us why a TTL of 3600s is of interest? What are your= access patterns?=0A>>=0A>> -Amandeep=0A>>=0A>>=0A>> On Friday, June 1, 201= 2 at 3:59 PM, Tom Brown wrote:=0A>>=0A>>> I have a table that holds rotatin= g data. It has a TTL of 3600. For=0A>>> some reason, when I scan the table = I still get old cells that are much=0A>>> older than that TTL.=0A>>>=0A>>> = I have tried issuing a compaction request via the web UI, but that=0A>>> di= dn't seem to do anything.=0A>>>=0A>>> Am I misunderstanding the data model = used by HBase? Is there anything=0A>>> else I can check to verify the funct= ionality of my integration?=0A>>>=0A>>> I am using HBase 0.92 with Hadoop 1= .0.2.=0A>>>=0A>>> Thanks in advance!=0A>>>=0A>>> --Tom=0A>>=0A>