lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chuck Williams" <ch...@manawiz.com>
Subject RE: Contribution: better multi-field searching
Date Tue, 12 Oct 2004 21:01:38 GMT
Paul,

Thanks for the feedback and suggestions.  You are of course correct that
the implementation I've chosen will have poor performance if there are a
large number of subqueries in a MaxDisjunctionQuery.  However, this is
not the case in my usage, nor in the intended primary usage in general.
When using MaxDisjunctionQuery as a technique for searching terms across
multiple fields there will never be more subqueries than there are
distinct fields across which you want to search a single term.
Especially when combined with the technique of concatenating equally
important fields into larger search fields (so that only the reduced set
of search fields need be searched) this number is never more than a
handful.

I should probably at least add to the comment the fact that the
implementation is optimized for small numbers of subqueries.

It's an interesting question how the performance of MaxDisjunctionQuery
compares to that of BooleanQuery as the number of subqueries varies.  My
guess is that MaxDisjunctionQuery is faster for small numbers of
subqueries, but that for larger numbers BooleanQuery gets faster,
possibly much faster for very large numbers of subqueries (depending on
the distribution of documents beings queried).  If I have a chance, I'll
run some comparative timings out of curiosity.

Did you see my IDF question at the bottom of the original note?  I'm
really curious why the square of IDF is used for Term and Phrase
queries, rather than just IDF.  It seems like it might be a bug?

Chuck

> -----Original Message-----
> From: Paul Elschot [mailto:paul.elschot@xs4all.nl]
> Sent: Tuesday, October 12, 2004 11:04 AM
> To: Lucene Developers List
> Subject: Re: Contribution: better multi-field searching
> 
> Chuck,
> 
> The scorer keeps a sorted array of subscorers and sorts it
> whenever needed. It's somewhat easier to implement that
> with a util.PriorityQueue, but can't say whether it would be
> faster.
> 
> For a definitely faster implementation one can start from
> Lucene's BooleanScorer and assume all clauses
> are optional. Instead of summing just use the maximum.
> 
> BooleanScorer works ahead for each scorer to avoid
> the need for keeping the scorers sorted.
> But you'll probably loose skipTo() when using BooleanScorer.
> 
> Regards,
> Paul Elschot.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message