lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <>
Subject Re: Optimization of memory usage in PriorityQueue
Date Mon, 22 Jun 2009 07:25:04 GMT
I think the common use case of TopScoreDocCollector is to request 10
results, then ask for 20 and so on. You ask for N results because you want
to display them, or manipulate them in some way.

However, if we do add this to the base PQ, it means an extra check for every
put() call. We've tried very hard recently to remove 'if' checks in
TopScoreDocCollector and TopFieldCollector, so adding one back does not seem
very appealing to me, especially since I IMO the use case of requesting 10K
results w/o knowing first there are more or so that match the query, is not
so common.

Of course, for the what I call common case, if I do know there are 10K
results, and request for 10K, but PQ starts w/ 10 (for example) and
gradually grows its array, I also lose performance, not just because of
extra 'if', but I also allocate ~20K slots (counting re-allocations).

You could write your own PQ which does that, although to achieve that we'll
need to make some methods on PQ not final. You will also need to write your
own TopScoreDocCollector version, since the latter does not accept PQ and is
a final class.

But perhaps others think it's worth it to add to PQ an extra 'if' and avoid
unnecessary large allocations? Or ... can we make PQ's methods not final,
and let others extend it and do what they want, not just impl lessThan()?
We can also have a factory method for PQ which returns a gradually growing
PQ, or an "init once" PQ, the latter will be used by TSDC and TFC and the
former can be asked for specifically.

Or ... we can do nothing, and say that one can write his own Collector, and
use Sun's PQ (or any other), if one has a case like "I need 10K results, but
I don't know how many are there, and specifically I want to optimize for the
case of 1 result".

BTW, notice that Sun's PQ does not have some of the methods we use today,
such as insertWithOverflow(), add() and updateTop(). Those were also
recently added for better performance.


On Mon, Jun 22, 2009 at 2:39 AM, Claudio . <> wrote:

>  Hi,
> The PriorityQueue used in TopDocCollector does not optimize the memory
> usage. If I do this search:
> TopDocs topDocs =, null, 10000);
> And only 1 document is returned. The PriorityQueue will create an Object[]
> of size 10000 + 1, but only one position of the array is used.
> The Lucene could implement a PriorityQueue with size extensible.
> The PriorityQueue of the Sun is size extensible. Why does not use it?
> Thanks
> ------------------------------
> What can you do with the new Windows Live? Find out<>

View raw message