From java-user-return-13874-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Fri Apr 01 19:29:04 2005 Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 14341 invoked from network); 1 Apr 2005 19:29:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 1 Apr 2005 19:29:04 -0000 Received: (qmail 671 invoked by uid 500); 1 Apr 2005 19:28:58 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 650 invoked by uid 500); 1 Apr 2005 19:28:58 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 635 invoked by uid 99); 1 Apr 2005 19:28:58 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from Unknown (HELO ehatchersolutions.com) (69.55.225.129) by apache.org (qpsmtpd/0.28) with ESMTP; Fri, 01 Apr 2005 11:28:57 -0800 Received: by ehatchersolutions.com (Postfix, from userid 504) id 2011D13E2122; Fri, 1 Apr 2005 14:28:56 -0500 (EST) Received: from [128.143.167.108] (d-128-167-108.bootp.Virginia.EDU [128.143.167.108]) by ehatchersolutions.com (Postfix) with ESMTP id 3201E13E2006 for ; Fri, 1 Apr 2005 14:28:42 -0500 (EST) Mime-Version: 1.0 (Apple Message framework v619.2) In-Reply-To: <200504012010.45671.paul.elschot@xs4all.nl> References: <6eaaed68ad071ce34613d9b154cf7368@ehatchersolutions.com> <200504012010.45671.paul.elschot@xs4all.nl> Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: <8f323a5ab7400452c907eed693709ac8@ehatchersolutions.com> Content-Transfer-Encoding: 7bit From: Erik Hatcher Subject: Re: Deeply nested boolean query performance Date: Fri, 1 Apr 2005 14:26:20 -0500 To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.619.2) X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on javelina X-Spam-Status: No, score=-3.2 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.1 X-Spam-Level: X-Virus-Checked: Checked X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Paul, Thanks for your very thorough response. It is very helpful. For all my projects, I'm using the latest Subversion codebase and staying current with any changes there, so that is very good news. Erik On Apr 1, 2005, at 1:10 PM, Paul Elschot wrote: > On Friday 01 April 2005 18:14, Erik Hatcher wrote: >> I will soon create some tests for this scenario, but wanted to run >> this >> by the list as well.... > > Great, see below. > >> What performance differences would be seen between a query like this: >> >> a AND b AND c AND d > > This will use a single ConjunctionScorer, and it is the fastest form. > >> and this one: >> >> ((a AND b) AND c) AND d > >> In other words, will building a query with nested boolean queries be >> substantially slower than a single boolean query with many clauses? >> Or >> might it be the other way around? > > This will use a ConjunctionScorer for (a AND b), assuming a and > b are terms. For the other AND operators a BooleanScorer will be > used in 1.4.3. The development version will use a ConjunctionScorer > at each AND operator. > > The main difference between a ConjunctionScorer and a BooleanScorer > is the use of skipTo(), ie. the forwarding information in the term docs > index, that allows to 'fast forward' to a given document. > This 'fast forward' is useful for AND queries, and ConjunctionScorer > does it, > BooleanScorer simply uses next() instead. The next() method iterates > over all documents in a term docs index. > > In other words, the nested form should be significantly slower than > the flat form in 1.4.3, and just a bit slower in the development > version. > > Another skipTo advantage comes from this form: > (a OR b) and c > In 1.4.3, this uses a BooleanScorer for both operators, making this > as much work as: > (a OR b) OR c. > In the development version, the OR operator gets a DisjunctionScorer, > and the AND operator a ConjunctionScorer, both allowing the use > of skipTo(), even on the a and b terms. > > In this context (a OR b) can also be for example a fuzzy query or a > prefix > query. > > The development version also uses skipTo() on b in the following > situations: > +a b > a -b > > So, when you measure, please use both 1.4.3 and the development version > to see the differences. And, off course, the larger your index, the > better. > As the code is still a bit young, you might be in for some surprises, > too. > skipTo() has the biggest advantages when the index data is not > available in any cache. > > Regards, > Paul Elschot. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org