lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: eliminating scoring for the sake of efficiency
Date Thu, 11 May 2006 21:42:21 GMT
On Thursday 11 May 2006 22:42, Boris Galitsky wrote:
> Hello
> 
>     We don't need any scoring in our application domain, but 
> efficiency is the key because we are getting tens thousand of hits for 
> span queries; all these hits are necessary to collect.
>     Is there a simple way to turn scoring off while indexing, while 
> search  and while delivering document IDs to save on time?

You could use getSpans() on the top level SpanQuery, and use a loop
calling next() on the Spans, and ignore duplicate doc() values from the Spans
in that loop.
A counter in the loop would also give you the number of matching occurrences
of the SpanQuery.

This way of using the Spans directly should be slightly more efficient than
using a HitCollector, but don't hold your breath.

In case you have ordered SpanQuery's without overlaps, the
NearSpansOrdered here  might be a bit faster than the NearSpans
currently in Lucene:
http://issues.apache.org/jira/browse/LUCENE-413
(you'll also need the patch to SpanNearQuery).

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message