lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antoine Baudoux>
Subject Several questions about scoring/sorting + random sorting in an image/related application
Date Fri, 15 Jun 2007 14:51:57 GMT

	I'm developping an image database. Each lucene document representing  
an image contains (among other fields ):

	- a date field
	- a collection field containing the ID of the collection the image  
belongs to.

	I want to be able to give a score to each collection. Collections  
with a higher score appear first in the results. I want to avoid re- 
indexing all the documents each time i change my collection scores.
	For example on day 1 I decide to give collection #1 a 5 score and  
collection #3 a 10 score --> images belonging to collection #3 appear  
first in search results.
	One day 2 i give collection #3 a 2 score --> images belonging to  
collection #1 appear first in search results.
	I have read the lucene docs, and from what i understand there are  
many ways to achieve what I want :

	- I can use a Very big Boolean query (OR query in fact) containing  
one TermQuery per collection ID, setting the correct boost factor for  
each termquery. The problem with this is that i have 300 collections,  
so i have a boolean query with 300 terms that i append to each query  
i make. I am afraid that it will be slow.

	- I can use a ValueSourceQuery, where for each document i compute a  
custom score based on the value of the collection field. Will it be  
faster than the first solution?

	- I can do advanced things such as writing a custom HitCollector, or  
a custom Query.

	- I can add another field to each document, containing a computed  
custom score, then i could sort on that field. But i want to avoid  
this solution at all costs, since it would mean re-indexing all the  
documents each time the collection scores change.

	What solution do you suggest?  Is there another solution that i  
didnt mention?

	More recent documents should also come first : In fact the final  
sorting should be a ponderated sum between the collection score of an  
image and the date of an image : most recent images from the best- 
scored collections come first, then most recent from less-scrored  
collections, then less recent from best scored, and so on. I would  
also like to be able to adjust the balance between date/collection  

	What solution do you suggest?

	I would also like to implement random-sorting. My solution is : i  
create 12 new fields R1 -> R12 for each document, each containing a  
random number between 1 and 12. To get a random sort, i sort each day  
with a different combination of R1 .. R12. For example :

	Day 1 : i sort by R1 then R4 then R5..
	Day 2 : i sort by R10 then R9 then R2....

Is it a good solution? Is there another way to do it?

	Very big thx in advance for your answers.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message