Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EFB627C83 for ; Wed, 26 Oct 2011 17:53:54 +0000 (UTC) Received: (qmail 92431 invoked by uid 500); 26 Oct 2011 17:53:54 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 92382 invoked by uid 500); 26 Oct 2011 17:53:53 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 92374 invoked by uid 99); 26 Oct 2011 17:53:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Oct 2011 17:53:53 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of saint.ack@gmail.com designates 209.85.215.169 as permitted sender) Received: from [209.85.215.169] (HELO mail-ey0-f169.google.com) (209.85.215.169) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Oct 2011 17:53:46 +0000 Received: by eye4 with SMTP id 4so2319189eye.14 for ; Wed, 26 Oct 2011 10:53:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; bh=9+VBCbRiuuHzXjBsjHhbELy9c4+z9uRd4zFGaE0p+JA=; b=xFwQ2ihEnXZertdjZrYtfdM++Gah4N06uKgikG6xz9eoPU39zpmugk6SZdJesaXe+e Ud8rnL9anQu3+KMMBVwmOyM235pR8w+6GG877hGOJH4C8k8W0dB+lJ3rACKB8qPXQUc8 pGb8EEIsZJ8qhwDc+0xaotiBbxLW2v1rXlR/Q= MIME-Version: 1.0 Received: by 10.182.13.6 with SMTP id d6mr5592175obc.11.1319651605587; Wed, 26 Oct 2011 10:53:25 -0700 (PDT) Sender: saint.ack@gmail.com Received: by 10.182.15.225 with HTTP; Wed, 26 Oct 2011 10:53:25 -0700 (PDT) In-Reply-To: <1319651008.14654.YahooMailNeo@web121702.mail.ne1.yahoo.com> References: <1319237376.29513.YahooMailNeo@web121716.mail.ne1.yahoo.com> <1319240083.71021.YahooMailNeo@web121719.mail.ne1.yahoo.com> <1319250858.76495.YahooMailNeo@web121719.mail.ne1.yahoo.com> <1319257328.31205.YahooMailNeo@web121719.mail.ne1.yahoo.com> <1319328963.35528.YahooMailNeo@web121701.mail.ne1.yahoo.com> <1319577747.38155.YahooMailNeo@web121706.mail.ne1.yahoo.com> <1319651008.14654.YahooMailNeo@web121702.mail.ne1.yahoo.com> Date: Wed, 26 Oct 2011 10:53:25 -0700 X-Google-Sender-Auth: WPM0ddq3OBn4a6jeKT5nHIksPgc Message-ID: Subject: Re: Strange performance behavior of SingleColumnValueFilter From: Stack To: dev@hbase.apache.org, lars hofhansl Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Yes. Should be off by default. St.Ack On Wed, Oct 26, 2011 at 10:43 AM, lars hofhansl wrote= : > Should there be an option to disable data block caching and only allow in= dex block caching? > For some analytical setups that might make sense. > (obviously, the same can be achieved by setting cacheBlocks to false in e= very Scan object) > > > > ----- Original Message ----- > From: lars hofhansl > To: "dev@hbase.apache.org" ; lars hofhansl > Cc: > Sent: Tuesday, October 25, 2011 2:22 PM > Subject: Re: Strange performance behavior of SingleColumnValueFilter > > It turns out that from other tests we did we had a stray > > > > =A0=A0=A0 hfile.block.cache.size > =A0=A0=A0 0 > > > > in our config. D'oh... > > When we removed that, the performance of SCVF was on par with ValueFilter= . > > Setting cacheBlocks on the Scan object had almost no affect, so this must= be related > to the caching of Index Blocks. > NEXT_ROW forces re-reading of Index Blocks it seems, whereas SKIP does no= t. > > So in summary: > When hfile.block.cache.size=3D0, returning NEXT_ROW from a ScanQueryMatch= er can be significantly slower than returning SKIP. > > -- Lars > > > ----- Original Message ----- > From: lars hofhansl > To: "dev@hbase.apache.org" > Cc: > Sent: Saturday, October 22, 2011 5:16 PM > Subject: Re: Strange performance behavior of SingleValColumnFilter > > Thanks N. > > I do not think the time is lost in the memstore. We're working with fully= compacted > tables and do no updates during the read testing. > > We'll be spending more time to track this down on Monday. > > > -- Lars > > ________________________________ > From: N Keywal > To: dev@hbase.apache.org > Sent: Saturday, October 22, 2011 2:53 PM > Subject: Re: Strange performance behavior of SingleValColumnFilter > > Hi, > > I made a change recently on this. It was to fix a consistency bug rather > than improve the performances, but on my test the performances were actua= lly > improved as well. It was for MemStore only. Is the time lost on the memst= ore > or in the persisted related part? > > Cheers, > > N. > > On Sat, Oct 22, 2011 at 6:22 AM, lars hofhansl wrot= e: > >> No it was a trunk build. The local tests I did with a build from today. >> Our test cluster is a 1 or 2 weeks old. >> >> It seems it just much cheaper to scan through block that we already have= or >> even scanning into the next block than to reseek. >> >> >> >> ----- Original Message ----- >> From: Ted Yu >> To: dev@hbase.apache.org; lars hofhansl >> Cc: >> Sent: Friday, October 21, 2011 8:22 PM >> Subject: Re: Strange performance behavior of SingleValColumnFilter >> >> Was the following evaluation performed on 0.92 ? >> Also, I assume you use ROWCOL bloom filter. >> In TRUNK, Mikhail has put in lazy seek which I think should help >> performance. >> >> Cheers >> >> On Fri, Oct 21, 2011 at 7:34 PM, lars hofhansl >> wrote: >> >> > We found that even with many columns, and even when the filter matches >> the >> > first column, SKIP is still faster than NEXT_ROW. >> > So either the reseek is extremely inefficient, or there is something e= lse >> > at play. >> > >> > It might be worthwhile to have StoreScanner upon SEEK_NEXT_ROW try the >> next >> > N KVs (maybe N=3D10 or 20 or even bigger) to see if we >> > get to the next row, and only if we didn't reach the next row do the >> > reseek. >> > >> > ________________________________ >> > From: lars hofhansl >> > To: "dev@hbase.apache.org" ; lars hofhansl < >> > lhofhansl@yahoo.com> >> > Sent: Friday, October 21, 2011 4:34 PM >> > Subject: Re: Strange performance behavior of SingleValColumnFilter >> > >> > Maybe it even makes sense. When the scan is limited to one column and >> there >> > is only one version, SKIP would skip to the next row. >> > But 10x slower for NEXT_ROW seems extreme. >> > >> > >> > >> > ________________________________ >> > From: lars hofhansl >> > To: hbase-dev >> > Sent: Friday, October 21, 2011 3:49 PM >> > Subject: Strange performance behavior of SingleValColumnFilter >> > >> > We have been doing some performance testing on HBase filters. One outc= ome >> > was HBASE-4626 (which I fixed and committed yesterday night). >> > >> > Now we found a rather strange behavior with SingleColumnValueFilter. O= n >> our >> > test cluster it is 10x slower than ValueFilter, even when we restrict = the >> > scan to just the one column we are filtering on and set filterIfMissin= g >> to >> > true. >> > We are not seeing that with HBase in local mode, which points to some >> > additional activity on the FS, which in HDFS would be slow compared to= a >> > local FS. >> > >> > >> > Indeed it turns out the problem goes away when we replace all NEXT_ROW >> with >> > SKIP in SingleColumnValueFilter.filterKeyValue the performance is *muc= h* >> > better (on par with ValueFilter). >> > >> > >> > We're using something pretty close to trunk for our tests. >> > The tables are pretty wide, only one version of each cells (and freshl= y >> > major compacted). >> > >> > >> > I do not know this part of the code that well (yet) and was wondering = if >> > somebody could chime in. Maybe this is related to HFileV2? >> > >> > I do recall there was something done to optimize reseeks. Generally I >> would >> > have expected NEXT_ROW to be a major performance improvement. >> > >> > Any ideas, comments, pointers? >> > >> > Thanks. >> > >> > -- Lars >> > >> >> >