lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen>
Subject RE: field sorted searches with unbounded hit count
Date Fri, 24 Jun 2011 06:39:10 GMT
On Thu, 2011-06-23 at 22:41 +0200, Tim Eck wrote:
>  I don't want to accuse anyone of bad code but always preallocating a 
>  potentially large array in org.apache.lucene.util.PriorityQueue seems
>  non-ideal for the search I want to run.

The current implementation of IndexSearcher uses threaded search where
each slice collects docID's independently, then adds them to a shared
PriorityQueue one at a time. With this architecture, making the
PriorityQueue size-optimized would either require multiple resizings
(more GC activity, slightly more processing) or that all search-threads
finishes before constructing the queue (longer response time).

The current implementation works really well when requesting small
result sets. It is not so fine for larger sets (partly because of memory
allocation, partly because the standard heap-based priority queue has
horrible locality, making it perform rather bad when it cannot be
contained in the cache) and - as you have observed - really bad for the
full document set. Finding a better general solution that covers all
three cases is a real challenge, a very interesting one I might add.
Of course one can always special case, but using a Collector seems like
the way to go there.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message