lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re:
Date Tue, 16 Sep 2008 18:57:58 GMT

: Related topic: what if we need all the hits and not just the first 100?

Solr has a FAQ related to this that i think also applies here..

>How can I get ALL the matching documents back? ... How can I return an 
>unlimited number of rows?
>This is impractical in most cases. People typically only want to do this 
>when they know they are dealing with an index whose size guarantees the 
>result sets will be always be small enough that they can feasibly be 
>transmitted in a manageable amount -- but if that's the case just specify 
>what you consider a "manageable amount" as your rows param and get the 
>best of both worlds (all the results when your assumption is right, and a 
>sanity cap on the result size if it turns out your assumptions are wrong) 

: TopDocCollector has a couple of drawbacks, one is that you need to know the
: number of hits before the query, and you can't overallocate as it will run you
: out of memory.  I take it this is the suggested workaround for that:

I think the suggested workarround at the moment is that if you *REALLY* 
want all results, you write your own HitCollector.  TopDocCollector is for 
collecting the top N docs ... if your N is "infinity" you don't need it.

: Or do we make a replacement for TopDocCollector which doesn't have this
: drawback, and uses an alternative for PriorityQueue which allows its array to
: grow?

I don't see that as being much better -- you still wouldn't want to pass 
MAX_INT because waht if there really are MAX_INT-42 results?  do you want 
the array to grow that big?  if you are prepared to deal with that many 
results, you might as well preallocate the array so that you dont' wind up 
with two ginormous arrays during the "grow" steps.

I suppose there's some "middle" ground though ... a collector where you 
say "i expect to have less then N results, so allocate a priority queue 
that big and start with that, but i'm willing to accept (and want) up to 
the first M results, so grow the queue to M if needed"


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message