Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 53776 invoked from network); 6 Nov 2007 23:02:45 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Nov 2007 23:02:45 -0000 Received: (qmail 76317 invoked by uid 500); 6 Nov 2007 23:02:26 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 76284 invoked by uid 500); 6 Nov 2007 23:02:26 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 76273 invoked by uid 99); 6 Nov 2007 23:02:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Nov 2007 15:02:26 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [194.109.24.25] (HELO smtp-vbr5.xs4all.nl) (194.109.24.25) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Nov 2007 23:02:28 +0000 Received: from k8u.lan (porta.xs4all.nl [80.127.24.69]) by smtp-vbr5.xs4all.nl (8.13.8/8.13.8) with ESMTP id lA6N26eY011036 for ; Wed, 7 Nov 2007 00:02:06 +0100 (CET) (envelope-from paul.elschot@xs4all.nl) From: Paul Elschot To: java-user@lucene.apache.org Subject: Re: Search performance using BooleanQueries in BooleanQueries Date: Wed, 7 Nov 2007 00:02:06 +0100 User-Agent: KMail/1.9.6 (enterprise 0.20070907.709405) References: <200710291743.14297.paul.elschot@xs4all.nl> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200711070002.06566.paul.elschot@xs4all.nl> X-Virus-Scanned: by XS4ALL Virus Scanner X-Virus-Checked: Checked by ClamAV on apache.org On Tuesday 06 November 2007 23:14:01 Mike Klaas wrote: > On 29-Oct-07, at 9:43 AM, Paul Elschot wrote: > > On Friday 26 October 2007 09:36:58 Ard Schrijvers wrote: > >> +prop1:a +prop2:b +prop3:c +prop4:d +prop5:e > >> > >> is much faster than > >> > >> (+(+(+(+prop1:a +prop2:b) +prop3:c) +prop4:d) +prop5:e) > >> > >> where the second one is a result from BooleanQuery in > >> BooleanQuery, and > >> all have Occur.MUST. > > > > SImplifying boolean queries like this is not available in Lucene, > > but it > > would have a positive effect on search performance, especially when > > prop1:a and prop2:b have a high document frequency. > > Wait--shouldn't the outer-most BooleanQuery provide most of this > speedup already (since it should be skipTo'ing between the nested > BooleanQueries and the outermost). Is it the indirection and sub- > query management that is causing the performance difference, or > differences in skiptTo behaviour? The usual Lucene answer to performance questions: it depends. After every hit, next() needs to be called on a subquery before skipTo() can be used to find the next hit. It is currently not defined which subquery will be used for this first next(). The structure of the scorers normally follows the structure of the BooleanQueries, so the indirection over the deep subquery scores could well be relevant to performance, too. Which of these factors actually dominates performance is hard to predict in advance. The point of skipTo() is that is tries to avoid disk I/O as much as possible for the first time that the query is executed. Later executions are much more likely to hit the OS cache, and then the indirections will be more relevant to performance. I'd like to have a good way to do a performance test on a first query execution, in the sense that it does not hit the OS cache for its skipTo() executions, but I have not found a good way yet. Regards, Paul Elschot --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org