lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <apa...@lucene.com>
Subject Re: improve performance of &quot;AND-queries&quot;
Date Sat, 25 Oct 2003 12:45:19 GMT
>From Antonio Gulli <gulli@di.unipi.it> on 23 Oct 2003:
> >It looks like both parts of the query are executed seperatly and then
> >they are merged. If Lucene would be able to execute the query with
> >less results (text:go) first and then only check if the second part
> >(title:"The Right Way") matches, those queries would be much faster.
> >  
> >
> This shoule be  standard way to process conjiuntive query.
> For instance "Managing Gigabyte" cap 4.3

Perhaps it would be a bit faster, but it also can use much more memory.  If the clause with
the fewer results still matches a large subset of the collection, then the scores and document
numbers of all of these matches must be stored, requiring at least eight bytes per intermediate
match.  Lucene's existing BooleanQuery algorithm operates using very little memory.

And what really is the savings?  One still must enumerate all of the TermDocs or TermPositions
of the more frequent clause.  So all that one saves is the amount of logic in the inner loop.
 But Lucene already optimizes this by using a combination of a hash-table and bitwise integer
operations, so that each step of the inner loop is constant time, and not proportional to
the number of clauses in the conjunction.

I'd be happy to see an alternate, faster implementation that does not use huge amounts of
memory, but, until then, I'm not convinced this is a better appoach.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message