lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: accelerate hits.id(i) function: eliminating scoring for the sake of efficiency
Date Fri, 12 May 2006 00:35:47 GMT

: However what significantly slows us down is the hits.id(i) function.
: Can we accelerate it somehow "cleaning" Lucene code itself from
: scoring?

you said in your last message...

:     We don't need any scoring in our application domain, but
: efficiency is the key because we are getting tens thousand of hits for
: span queries; all these hits are necessary to collect.

if you are iterating over all of the matching documents for each query,
and you are getting more then a few dozen matches for each query, then you
should not be using the Hits obejct at all.

Hits is designed for the "common case" or paginated searches with
10-20 items per page, that rarely care about going past page 5 or 6, and
don't mind if the high numbered pages take a little longer.

If you are iterating over all the matches, then you want do be using a
HitCollector.  If you use a Hits object, and you iterate past the first
100 results: it will do your search twice under the covers; if you go past
the 200th result, it will do your search threetimes. past 400, it will do
it 4 times, etc...



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message