Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B96B9F99D for ; Mon, 8 Apr 2013 21:29:54 +0000 (UTC) Received: (qmail 65489 invoked by uid 500); 8 Apr 2013 21:29:52 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 65451 invoked by uid 500); 8 Apr 2013 21:29:52 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 65443 invoked by uid 99); 8 Apr 2013 21:29:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Apr 2013 21:29:52 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [72.30.239.69] (HELO nm39.bullet.mail.bf1.yahoo.com) (72.30.239.69) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Apr 2013 21:29:47 +0000 Received: from [98.139.215.143] by nm39.bullet.mail.bf1.yahoo.com with NNFMP; 08 Apr 2013 21:29:25 -0000 Received: from [98.139.212.211] by tm14.bullet.mail.bf1.yahoo.com with NNFMP; 08 Apr 2013 21:29:25 -0000 Received: from [127.0.0.1] by omp1020.mail.bf1.yahoo.com with NNFMP; 08 Apr 2013 21:29:25 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 824179.56399.bm@omp1020.mail.bf1.yahoo.com Received: (qmail 95534 invoked by uid 60001); 8 Apr 2013 21:29:25 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1365456565; bh=7mgRhLmMx30pB5D69gtFOGFoC9lMG1XRLuqHmCaL+nQ=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-RocketYMMF:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=ZplpnKgYt7ps9ksNNQScoEbpazqkjYPsDI1Mvq1a9S949ZoTGsOTq412g/Pa8EsAfjfc17LApLiBh8xy47OzCs0KYlzw0MFIS/Ko5sKBebddqNUcyEbCv3k0J+zFFDu2BI+KvGKuHg/wE8Sopkb3AZa0yZwhSzSeHdEEDSv1ocw= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-RocketYMMF:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=1/7ANOc3iveg/Igo0ZB/Fg/4NF6OWi0+WbPm+RhMqbdXo/yxC+Ldm3IjbH8Ip19a/lVqLr+1u6qgKloYJJkqC9cYBCO4aAfldclx5F4GJhEiYe574z+cMvuh3B8xn5wwrT0Rvly+fXxlLOmhIF/Fq3YX/Bv8P3tvS0mLNzElmcw=; X-YMail-OSG: _gF.aggVM1kEmxxq8w9cScXGm7bAIfBJ4JbyEjaLhebp3aA 3XGvH1kWjyxRxCxn1kTX7yLG7FsxU.nZdZj01iEC5H9bCsRDi9eZrXo4Tup7 1ierC1GBLJhoK5HW2qatN7Bo.LwHh0FOOvijvxrKV9ik8yh_JFz30prne4DR COiXbGq70duSHDthmr0dETGlnhNGJU95Br9U1JKkfgOkqwYBgAwO0dx1l3OJ 94ZJoX2sNcIcNG9nJKcDsLBtWvq.OiuM8Xdg1_gbdWvmPKP7o9BZDIUW4rg9 5wYueCK9.yB6hjdHWq__MsmkAEYzLXiK5dmsIg_R26RcVSJliLmNb4qty4j0 r0u5dwp134A1iMvxO7iIwep.MsjF4WGB6XLeBkr7PIlrhW.qYpZOKkRAZBj9 Xvb0WZNJWvjTaQmj_aPgYu7l38rs4SbGSvf0_HmGKcAuc6Iqc1LC5id2r.LL eZOrYSk7dBuc4YHyTGNDBDbCeaA5CPK6Bnk12XjfvKlFiptijumAkdqzrQMR lbnY0ymNieKq_RvOyGAGQV.x35GJhJKBOqdb335zDBp3TM7p8Obi9LXUd8Si 9Qjoe Received: from [204.14.239.221] by web140606.mail.bf1.yahoo.com via HTTP; Mon, 08 Apr 2013 14:29:25 PDT X-Rocket-MIMEInfo: 002.001,SW4gdGhpcyBjYXNlIGl0IGlzIGhhbmRsZWQgYWxsIGF0IHRoZSBzZXJ2ZXIsIGFuZCBpZiBkb2luZyBzY2FucyB5b3Ugc3RpbGwgZ2V0IHRoZSBiZW5lZml0cyBvZiB0aGUgc2VxdWVudGlhbCBhY2Nlc3MgcGF0dGVybiAocmF0aGVyIGRvaW5nIGEgbG90IG9mIHNlZWtzIGZvciBwb2ludCBHZXRzKS4KCi0tIExhcnMKCgoKX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KIEZyb206IEplYW4tTWFyYyBTcGFnZ2lhcmkgPGplYW4tbWFyY0BzcGFnZ2lhcmkub3JnPgpUbzogdXNlckBoYmFzZS5hcGFjaGUBMAEBAQE- X-RocketYMMF: lhofhansl X-Mailer: YahooMailWebService/0.8.140.532 References: <51610C9B.5090705@salesforce.com> <5161BD07.6090704@salesforce.com> <1365393179.99772.YahooMailNeo@web140604.mail.bf1.yahoo.com> Message-ID: <1365456565.95404.YahooMailNeo@web140606.mail.bf1.yahoo.com> Date: Mon, 8 Apr 2013 14:29:25 -0700 (PDT) From: lars hofhansl Reply-To: lars hofhansl Subject: Re: Essential column family performance To: "user@hbase.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="1905101558-1888654842-1365456565=:95404" X-Virus-Checked: Checked by ClamAV on apache.org --1905101558-1888654842-1365456565=:95404 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable In this case it is handled all at the server, and if doing scans you still = get the benefits of the sequential access pattern (rather doing a lot of se= eks for point Gets).=0A=0A-- Lars=0A=0A=0A=0A______________________________= __=0A From: Jean-Marc Spaggiari =0ATo: user@hbase.= apache.org =0ASent: Monday, April 8, 2013 10:19 AM=0ASubject: Re: Essential= column family performance=0A =0ASomething I'm not getting, why not using s= eparate tables instead of=0ACFs for a single table? Simply name your table = tablename_cfname then=0Ayou get ride of the CF# limitation?=0A=0AOr is ther= e big pros to have CFs?=0A=0AJM=0A=0A2013/4/8 Anoop John :=0A> Agree here. The effectiveness depends on what % of data satisfie= s the=0A> condition, how it is distributed across HFile blocks. We will get= =0A> performance gain when the we will be able to skip some HFile blocks (f= rom=0A> non essential CFs). Can test with different HFile block size (lower= value)?=0A>=0A> -Anoop-=0A>=0A>=0A> On Mon, Apr 8, 2013 at 8:19 PM, Ted Yu= wrote:=0A>=0A>> I made the following change in TestJ= oinedScanners.java:=0A>>=0A>> -=A0 =A0 =A0 int flag_percent =3D 1;=0A>> += =A0 =A0 =A0 int flag_percent =3D 40;=0A>>=0A>> The test took longer but sti= ll favors joined scanner.=0A>> I got some new results:=0A>>=0A>> 2013-04-08= 07:46:06,959 INFO=A0 [main] regionserver.TestJoinedScanners(157):=0A>> Slo= w scanner finished in 7.424388 seconds, got 2050 rows=0A>> ...=0A>> 2013-04= -08 07:46:12,010 INFO=A0 [main] regionserver.TestJoinedScanners(157):=0A>> = Joined scanner finished in 5.05063 seconds, got 2050 rows=0A>>=0A>> 2013-04= -08 07:46:18,358 INFO=A0 [main] regionserver.TestJoinedScanners(157):=0A>> = Slow scanner finished in 6.348517 seconds, got 2050 rows=0A>> ...=0A>> 2013= -04-08 07:46:22,946 INFO=A0 [main] regionserver.TestJoinedScanners(157):=0A= >> Joined scanner finished in 4.587545 seconds, got 2050 rows=0A>>=0A>> Loo= ks like effectiveness of joined scanner is affected by distribution of=0A>>= data.=0A>>=0A>> Cheers=0A>>=0A>> On Sun, Apr 7, 2013 at 8:52 PM, lars hofh= ansl wrote:=0A>>=0A>> > Looking at the joined scanner te= st code, it sets it up such that 1% of=0A>> the=0A>> > rows match, which wo= uld somewhat be in line with James' results.=0A>> >=0A>> > In my own testin= g a while ago I found a 100% improvement with 0% match.=0A>> >=0A>> >=0A>> = > -- Lars=0A>> >=0A>> >=0A>> >=0A>> > ________________________________=0A>>= >=A0 From: Ted Yu =0A>> > To: user@hbase.apache.org= =0A>> > Sent: Sunday, April 7, 2013 4:13 PM=0A>> > Subject: Re: Essential c= olumn family performance=0A>> >=0A>> > I have attached 5416-TestJoinedScann= ers-0.94.txt to HBASE-5416 for your=0A>> > reference.=0A>> >=0A>> > On my M= acBook, I got the following results from the test:=0A>> >=0A>> > 2013-04-07= 16:08:17,474 INFO=A0 [main]=0A>> regionserver.TestJoinedScanners(157):=0A>= > > Slow scanner finished in 7.973822 seconds, got 100 rows=0A>> > ...=0A>>= > 2013-04-07 16:08:17,946 INFO=A0 [main]=0A>> regionserver.TestJoinedScann= ers(157):=0A>> > Joined scanner finished in 0.47235 seconds, got 100 rows= =0A>> >=0A>> > Cheers=0A>> >=0A>> > On Sun, Apr 7, 2013 at 4:03 PM, Ted Yu = wrote:=0A>> >=0A>> > > Looking at=0A>> > >=0A>> >=0A>= > https://issues.apache.org/jira/secure/attachment/12564340/5416-0.94-v3.tx= t=0A>> ,=0A>> > I found that it didn't contain TestJoinedScanners which sho= ws=0A>> > > difference in scanner performance:=0A>> > >=0A>> > >=A0 =A0 LOG= .info((slow ? "Slow" : "Joined") + " scanner finished in " +=0A>> > > Doubl= e.toString(timeSec)=0A>> > >=0A>> > >=A0 =A0 =A0 + " seconds, got " + Long= .toString(rows_count/2) + " rows");=0A>> > >=0A>> > > The test uses SingleC= olumnValueFilter:=0A>> > >=0A>> > >=A0 =A0 SingleColumnValueFilter filter = =3D new SingleColumnValueFilter(=0A>> > >=0A>> > >=A0 =A0 =A0 =A0 cf_essen= tial, col_name, CompareFilter.CompareOp.EQUAL,=0A>> flag_yes);=0A>> > > It = is possible that the custom filter you were using would exhibit=0A>> > > di= fferent access pattern compared to SingleColumnValueFilter. e.g. does=0A>> = > > your filter utilize hint ?=0A>> > > It would be easier for me and other= people to reproduce the issue you=0A>> > > experienced if you put your sce= nario in some test similar to=0A>> > > TestJoinedScanners.=0A>> > >=0A>> > = > Will take a closer look at the code Monday.=0A>> > >=0A>> > > Cheers=0A>>= > >=0A>> > > On Sun, Apr 7, 2013 at 11:37 AM, James Taylor > > >wrote:=0A>> > >=0A>> > >> Yes, on 0.94.6. We have our own = custom filter derived from FilterBase,=0A>> > so=0A>> > >> filterIfMissing = isn't the issue - the results of the scan are correct.=0A>> > >>=0A>> > >> = I can see that if the essential column family has more data compared=0A>> t= o=0A>> > >> the non essential column family that the results would eventual= ly even=0A>> > out.=0A>> > >> I was hoping to always be able to enable the = essential column family=0A>> > >> feature. Is there an inherent reason why = performance would degrade=0A>> like=0A>> > >> this? Does it boil down to a = single sequential scan versus many seeks?=0A>> > >>=0A>> > >> Thanks,=0A>> = > >>=0A>> > >> James=0A>> > >>=0A>> > >>=0A>> > >> On 04/07/2013 07:44 AM, = Ted Yu wrote:=0A>> > >>=0A>> > >>> James:=0A>> > >>> Your test was based on= 0.94.6.1, right ?=0A>> > >>>=0A>> > >>> What Filter were you using ?=0A>> = > >>>=0A>> > >>> If you used SingleColumnValueFilter, have you seen my comm= ent here ?=0A>> > >>> https://issues.apache.org/**jira/browse/HBASE-5416?**= =0A>> > >>> focusedCommentId=3D13541229&**page=3Dcom.atlassian.jira.**=0A>>= > >>> plugin.system.issuetabpanels:**comment-tabpanel#comment-**13541229<= =0A>> >=0A>> https://issues.apache.org/jira/browse/HBASE-5416?focusedCommen= tId=3D13541229&page=3Dcom.atlassian.jira.plugin.system.issuetabpanels:comme= nt-tabpanel#comment-13541229=0A>> > >=0A>> > >>>=0A>> > >>> BTW the use cas= e Max Lapan tried to address has non essential column=0A>> > >>> family=0A>= > > >>> carrying considerably more data compared to essential column family= .=0A>> > >>>=0A>> > >>> Cheers=0A>> > >>>=0A>> > >>>=0A>> > >>>=0A>> > >>> = On Sat, Apr 6, 2013 at 11:05 PM, James Taylor <=0A>> jtaylor@salesforce.com= =0A>> > >>> >wrote:=0A>> > >>>=0A>> > >>>=A0 Hello,=0A>> > >>>> We're doing= some performance testing of the essential column family=0A>> > >>>> featur= e, and we're seeing some performance degradation when=0A>> comparing=0A>> >= >>>> with=0A>> > >>>> and without the feature enabled:=0A>> > >>>>=0A>> > = >>>>=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Performance of = scan relative=0A>> > >>>> % of rows selected=A0 =A0 =A0 =A0 to not enabling= the feature=0A>> > >>>> ---------------------=A0 =A0 ---------------------= ---------****--=0A>> > >>>>=0A>> > >>>> 100%=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 1.0x=0A>> > >>>>=A0 80%=A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 2.0x=0A>> > >>>>=A0 60%=A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 2.3x=0A>> > >>>>=A0 40%=A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 2.2x=0A>> > >>>>=A0 20%=A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 1.5x=0A>> > >>>>=A0 10%= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 1.0x=0A>> > >>>>=A0= =A0 5%=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0.67x=0A>> >= >>>>=A0 =A0 0%=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0.30= %=0A>> > >>>>=0A>> > >>>> In our scenario, we have two column families. The= key value from the=0A>> > >>>> essential column family is used in the filt= er, while the key value=0A>> > from=0A>> > >>>> the=0A>> > >>>> other, non = essential column family is returned by the scan. Each row=0A>> > >>>> conta= ins values for both key values, with the values being=0A>> relatively=0A>> = > >>>> narrow (less than 50 bytes). In this scenario, the only time we're= =0A>> > >>>> seeing a=0A>> > >>>> performance gain is when less than 10% of= the rows are selected.=0A>> > >>>>=0A>> > >>>> Is this a reasonable test? = Has anyone else measured this?=0A>> > >>>>=0A>> > >>>> Thanks,=0A>> > >>>>= =0A>> > >>>> James=0A>> > >>>>=0A>> > >>>>=0A>> > >>>>=0A>> > >>>>=0A>> > >= >>>=0A>> > >>>>=0A>> > >>>>=0A>> > >>=0A>> > >=0A>> >=0A>> --1905101558-1888654842-1365456565=:95404--