lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Deeply nested boolean query performance
Date Fri, 01 Apr 2005 19:26:20 GMT
Paul,

Thanks for your very thorough response.  It is very helpful.

For all my projects, I'm using the latest Subversion codebase and 
staying current with any changes there, so that is very good news.

	Erik

On Apr 1, 2005, at 1:10 PM, Paul Elschot wrote:

> On Friday 01 April 2005 18:14, Erik Hatcher wrote:
>> I will soon create some tests for this scenario, but wanted to run 
>> this
>> by the list as well....
>
> Great, see below.
>
>> What performance differences would be seen between a query like this:
>>
>> 	a AND b AND c AND d
>
> This will use a single ConjunctionScorer, and it is the fastest form.
>
>> and this one:
>>
>> 	((a AND b) AND c) AND d
>
>> In other words, will building a query with nested boolean queries be
>> substantially slower than a single boolean query with many clauses?  
>> Or
>> might it be the other way around?
>
> This will use a ConjunctionScorer for (a AND b), assuming a and
> b are terms. For the other AND operators a BooleanScorer will be
> used in 1.4.3. The development version will use a ConjunctionScorer
> at each AND operator.
>
> The main difference between a ConjunctionScorer and a BooleanScorer
> is the use of skipTo(), ie. the forwarding information in the term docs
> index, that allows to 'fast forward' to a given document.
> This 'fast forward' is useful for AND queries, and ConjunctionScorer 
> does it,
> BooleanScorer simply uses next() instead. The next() method iterates
> over all documents in a term docs index.
>
> In other words, the nested form should be significantly slower than
> the flat form in 1.4.3, and just a bit slower in the development 
> version.
>
> Another skipTo advantage comes from this form:
> (a OR b) and c
> In 1.4.3, this uses a BooleanScorer for both operators, making this
> as much work as:
> (a OR b) OR c.
> In the development version, the OR operator gets a DisjunctionScorer,
> and the AND operator a ConjunctionScorer, both allowing the use
> of skipTo(), even on the a and b terms.
>
> In this context (a OR b) can also be for example a fuzzy query or a 
> prefix
> query.
>
> The development version also uses skipTo() on b in the following 
> situations:
> +a b
> a -b
>
> So, when you measure, please use both 1.4.3 and the development version
> to see the differences. And, off course, the larger your index, the 
> better.
> As the code is still a bit young, you might be in for some surprises, 
> too.
> skipTo() has the biggest advantages when the index data is not
> available in any cache.
>
> Regards,
> Paul Elschot.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message