lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Smith" <ssm...@mainstreamdata.com>
Subject RE: To Sort or not to Sort
Date Fri, 17 Dec 2004 06:48:15 GMT
I think we have a winner.  Number 1 it is.  Thanks for the information.

	-----Original Message----- 
	From: Doug Cutting [mailto:cutting@apache.org] 
	Sent: Thu 12/16/2004 10:25 PM 
	To: Lucene Users List 
	Cc: 
	Subject: Re: To Sort or not to Sort
	
	

	Scott Smith wrote:
	> 1.    Simply use the built-in lucene sort functionality, cache the hit
	> list and then page through the list.  Adv: looks pretty straight
	> forward, I write less code.  Dis: for searches that return a large
	> number of hits (having a search return several hundred to a few thousand
	> hits is not uncommon), Lucene is sorting a lot of entries that don't
	> really need to be sorted (because the user will never look at them) and
	> sorting tends to be expensive.
	> 2.    The other solution uses a priority heap to collect the top N (or
	> next N) entries.  I still have to walk the entire hit list, but keeping
	> entries in a priority heap means I can determine the N entries I need
	> with a few comparisons and minimal sorting.  I don't have to sort a
	> bunch of entries whose order I don't care about.  Additionally, I don't
	> have to have all of the entries in memory at one time.  The big
	> disadvantage with this is that I have to write more code.  However, it
	> may be worth it if the performance difference is large enough.
	
	Lucene's built-in sorting code already performs the optimization you
	describe as (2).  So don't bother re-inventing it!
	
	Doug
	
	---------------------------------------------------------------------
	To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
	For additional commands, e-mail: lucene-user-help@jakarta.apache.org
	
	

Mime
View raw message