Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AD39D72DD for ; Sat, 22 Oct 2011 04:22:43 +0000 (UTC) Received: (qmail 59455 invoked by uid 500); 22 Oct 2011 04:22:43 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 59280 invoked by uid 500); 22 Oct 2011 04:22:41 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 59255 invoked by uid 99); 22 Oct 2011 04:22:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Oct 2011 04:22:39 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [98.138.91.132] (HELO nm2-vm3.bullet.mail.ne1.yahoo.com) (98.138.91.132) by apache.org (qpsmtpd/0.29) with SMTP; Sat, 22 Oct 2011 04:22:29 +0000 Received: from [98.138.90.52] by nm2.bullet.mail.ne1.yahoo.com with NNFMP; 22 Oct 2011 04:22:08 -0000 Received: from [98.138.89.195] by tm5.bullet.mail.ne1.yahoo.com with NNFMP; 22 Oct 2011 04:22:08 -0000 Received: from [127.0.0.1] by omp1053.mail.ne1.yahoo.com with NNFMP; 22 Oct 2011 04:22:08 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 882887.23579.bm@omp1053.mail.ne1.yahoo.com Received: (qmail 32969 invoked by uid 60001); 22 Oct 2011 04:22:08 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1319257328; bh=zNWMdI/AxgVyY4jMMHYlh0Mb5vQ/fRLbJhnnEO8sXro=; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=P7goaPH5+w52236a/b48u/wretY1qiaYp/cfmQjVWcNz84CgCtbH//Nn6pzxpLpf9n3nHBqt5Apoud1DtEDbWMcL/VW1y0hKU/Dp6J/F5y4hasXkxZq+bH/Zio4Wwr2dOhjPCshaz1BAb7dpSQP5OazMAqHothf0pxxTDpgLzWc= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=y0hHzA7KoIdCRhvYfC+ygk4YHvRCWXuRdPf1lYeeAQiZaCJd3XXjSpsKL04PYZFwhFcaq2pTs/6m/fltkXpdi8RuyG7CaZffqChfQXH9Mr2ogef753N8/ryrx9aJcJQ92wAQ7idx5rODOvhd72GIefsVxTD0GuCoQN90rpPgY5o=; X-YMail-OSG: ACJzvMoVM1llI72fdpEdrl_vd0iq94SYFkruM0RCNAqvyjJ yirhjh0oTeqzhaF3zn27y9qUwQ5CONSZQ1PH9Tkn8g7OkkSqlM4CUVFYTMkI floRWmZuT13TOyFePhWl3Ny8omOc1e9Woo2Vg.QdkUCh5PDpzgIaAXWDdjrb j8kw3hZeXxFXud9dcrqYGRy993LH3DSewcp3HZnSm6dYonTqD27utg8vUlQw arZg66W6zBbgMGqU8qizdsarKxN1IopmtF6P7Zb4qHhaYe_QnE23ESA0o5wj f2i8o5zswHID7dHQ95jhCfGK2zVH.SGIR_9eML95A7bC0MmctEU77FnEJ2Rr zG4MoYIUVRy3rgn40yzHsBugfehyzqCm7HsA.Nl78Ann8JiGRAzFJffc2FdJ OT7_pW9ptkg.6ECcDTby1FOWL0BYBLCdBm_8o2hkR4y0rzZPM8HptpZD32Hd 9OCWcbg-- Received: from [69.181.180.38] by web121719.mail.ne1.yahoo.com via HTTP; Fri, 21 Oct 2011 21:22:08 PDT X-Mailer: YahooMailWebService/0.8.114.317681 References: <1319237376.29513.YahooMailNeo@web121716.mail.ne1.yahoo.com> <1319240083.71021.YahooMailNeo@web121719.mail.ne1.yahoo.com> <1319250858.76495.YahooMailNeo@web121719.mail.ne1.yahoo.com> Message-ID: <1319257328.31205.YahooMailNeo@web121719.mail.ne1.yahoo.com> Date: Fri, 21 Oct 2011 21:22:08 -0700 (PDT) From: lars hofhansl Reply-To: lars hofhansl Subject: Re: Strange performance behavior of SingleValColumnFilter To: "dev@hbase.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii No it was a trunk build. The local tests I did with a build from today. Our test cluster is a 1 or 2 weeks old. It seems it just much cheaper to scan through block that we already have or even scanning into the next block than to reseek. ----- Original Message ----- From: Ted Yu To: dev@hbase.apache.org; lars hofhansl Cc: Sent: Friday, October 21, 2011 8:22 PM Subject: Re: Strange performance behavior of SingleValColumnFilter Was the following evaluation performed on 0.92 ? Also, I assume you use ROWCOL bloom filter. In TRUNK, Mikhail has put in lazy seek which I think should help performance. Cheers On Fri, Oct 21, 2011 at 7:34 PM, lars hofhansl wrote: > We found that even with many columns, and even when the filter matches the > first column, SKIP is still faster than NEXT_ROW. > So either the reseek is extremely inefficient, or there is something else > at play. > > It might be worthwhile to have StoreScanner upon SEEK_NEXT_ROW try the next > N KVs (maybe N=10 or 20 or even bigger) to see if we > get to the next row, and only if we didn't reach the next row do the > reseek. > > ________________________________ > From: lars hofhansl > To: "dev@hbase.apache.org" ; lars hofhansl < > lhofhansl@yahoo.com> > Sent: Friday, October 21, 2011 4:34 PM > Subject: Re: Strange performance behavior of SingleValColumnFilter > > Maybe it even makes sense. When the scan is limited to one column and there > is only one version, SKIP would skip to the next row. > But 10x slower for NEXT_ROW seems extreme. > > > > ________________________________ > From: lars hofhansl > To: hbase-dev > Sent: Friday, October 21, 2011 3:49 PM > Subject: Strange performance behavior of SingleValColumnFilter > > We have been doing some performance testing on HBase filters. One outcome > was HBASE-4626 (which I fixed and committed yesterday night). > > Now we found a rather strange behavior with SingleColumnValueFilter. On our > test cluster it is 10x slower than ValueFilter, even when we restrict the > scan to just the one column we are filtering on and set filterIfMissing to > true. > We are not seeing that with HBase in local mode, which points to some > additional activity on the FS, which in HDFS would be slow compared to a > local FS. > > > Indeed it turns out the problem goes away when we replace all NEXT_ROW with > SKIP in SingleColumnValueFilter.filterKeyValue the performance is *much* > better (on par with ValueFilter). > > > We're using something pretty close to trunk for our tests. > The tables are pretty wide, only one version of each cells (and freshly > major compacted). > > > I do not know this part of the code that well (yet) and was wondering if > somebody could chime in. Maybe this is related to HFileV2? > > I do recall there was something done to optimize reseeks. Generally I would > have expected NEXT_ROW to be a major performance improvement. > > Any ideas, comments, pointers? > > Thanks. > > -- Lars >