lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: To Sort or not to Sort
Date Fri, 17 Dec 2004 05:25:41 GMT
Scott Smith wrote:
> 1.	Simply use the built-in lucene sort functionality, cache the hit
> list and then page through the list.  Adv: looks pretty straight
> forward, I write less code.  Dis: for searches that return a large
> number of hits (having a search return several hundred to a few thousand
> hits is not uncommon), Lucene is sorting a lot of entries that don't
> really need to be sorted (because the user will never look at them) and
> sorting tends to be expensive.
> 2.	The other solution uses a priority heap to collect the top N (or
> next N) entries.  I still have to walk the entire hit list, but keeping
> entries in a priority heap means I can determine the N entries I need
> with a few comparisons and minimal sorting.  I don't have to sort a
> bunch of entries whose order I don't care about.  Additionally, I don't
> have to have all of the entries in memory at one time.  The big
> disadvantage with this is that I have to write more code.  However, it
> may be worth it if the performance difference is large enough. 

Lucene's built-in sorting code already performs the optimization you 
describe as (2).  So don't bother re-inventing it!

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message