lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antoine Baudoux ...@taktik.be>
Subject Re: Several questions about scoring/sorting + random sorting in an image/related application
Date Fri, 15 Jun 2007 15:32:59 GMT
Could-you be more precise? I dont understand what you mean.



On 15 Jun 2007, at 17:20, Mathieu Lecarme wrote:

> Your request seems to be a two steps query.
> First step, you select image, and then collection
> Second step, you sort collection.
>
> BitVector can help you?
>
> M.
> Antoine Baudoux a écrit :
>>     Hi,
>>
>>     I'm developping an image database. Each lucene document
>> representing an image contains (among other fields ):
>>
>>     - a date field
>>     - a collection field containing the ID of the collection the  
>> image
>> belongs to.
>>
>>     I want to be able to give a score to each collection. Collections
>> with a higher score appear first in the results. I want to avoid
>> re-indexing all the documents each time i change my collection  
>> scores.
>>
>>     For example on day 1 I decide to give collection #1 a 5 score and
>> collection #3 a 10 score --> images belonging to collection #3 appear
>> first in search results.
>>     One day 2 i give collection #3 a 2 score --> images belonging to
>> collection #1 appear first in search results.
>>
>>     I have read the lucene docs, and from what i understand there are
>> many ways to achieve what I want :
>>
>>
>>     - I can use a Very big Boolean query (OR query in fact)  
>> containing
>> one TermQuery per collection ID, setting the correct boost factor for
>> each termquery. The problem with this is that i have 300 collections,
>> so i have a boolean query with 300 terms that i append to each  
>> query i
>> make. I am afraid that it will be slow.
>>
>>     - I can use a ValueSourceQuery, where for each document i compute
>> a custom score based on the value of the collection field. Will it be
>> faster than the first solution?
>>
>>     - I can do advanced things such as writing a custom HitCollector,
>> or a custom Query.
>>
>>     - I can add another field to each document, containing a computed
>> custom score, then i could sort on that field. But i want to avoid
>> this solution at all costs, since it would mean re-indexing all the
>> documents each time the collection scores change.
>>
>>     What solution do you suggest?  Is there another solution that i
>> didnt mention?
>>
>>     More recent documents should also come first : In fact the final
>> sorting should be a ponderated sum between the collection score of an
>> image and the date of an image : most recent images from the
>> best-scored collections come first, then most recent from less- 
>> scrored
>> collections, then less recent from best scored, and so on. I would
>> also like to be able to adjust the balance between date/collection  
>> score.
>>
>>     What solution do you suggest?
>>
>>
>>     I would also like to implement random-sorting. My solution is : i
>> create 12 new fields R1 -> R12 for each document, each containing a
>> random number between 1 and 12. To get a random sort, i sort each day
>> with a different combination of R1 .. R12. For example :
>>
>>     Day 1 : i sort by R1 then R4 then R5..
>>     Day 2 : i sort by R10 then R9 then R2....
>>     etc...
>>
>> Is it a good solution? Is there another way to do it?
>>
>>
>>     Very big thx in advance for your answers.
>>
>> Antoine
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message