lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antoine Baudoux ...@taktik.be>
Subject Re: Several questions about scoring/sorting + random sorting in an image/related application
Date Fri, 15 Jun 2007 15:52:26 GMT
The problem is that i want lucene to do the sorting, because the  
query qould return thousands of results, and I'm displaying documents  
one page at a time.
--
Antoine Baudoux
Development Manager
ab@taktik.be
Tél.: +32 2 333 58 44
GSM: +32 499 534 538
Fax.: +32 2 648 16 53


On 15 Jun 2007, at 17:42, Mathieu Lecarme wrote:

> First step is to feed a Set with "collection"
> Second step is to sort it.
>
> With a sortedSet, you can do that, isnt'it?
>
> M.
>
>
> Antoine Baudoux a écrit :
>> Could-you be more precise? I dont understand what you mean.
>>
>>
>>
>> On 15 Jun 2007, at 17:20, Mathieu Lecarme wrote:
>>
>>> Your request seems to be a two steps query.
>>> First step, you select image, and then collection
>>> Second step, you sort collection.
>>>
>>> BitVector can help you?
>>>
>>> M.
>>> Antoine Baudoux a écrit :
>>>>     Hi,
>>>>
>>>>     I'm developping an image database. Each lucene document
>>>> representing an image contains (among other fields ):
>>>>
>>>>     - a date field
>>>>     - a collection field containing the ID of the collection the  
>>>> image
>>>> belongs to.
>>>>
>>>>     I want to be able to give a score to each collection.  
>>>> Collections
>>>> with a higher score appear first in the results. I want to avoid
>>>> re-indexing all the documents each time i change my collection  
>>>> scores.
>>>>
>>>>     For example on day 1 I decide to give collection #1 a 5  
>>>> score and
>>>> collection #3 a 10 score --> images belonging to collection #3  
>>>> appear
>>>> first in search results.
>>>>     One day 2 i give collection #3 a 2 score --> images  
>>>> belonging to
>>>> collection #1 appear first in search results.
>>>>
>>>>     I have read the lucene docs, and from what i understand  
>>>> there are
>>>> many ways to achieve what I want :
>>>>
>>>>
>>>>     - I can use a Very big Boolean query (OR query in fact)  
>>>> containing
>>>> one TermQuery per collection ID, setting the correct boost  
>>>> factor for
>>>> each termquery. The problem with this is that i have 300  
>>>> collections,
>>>> so i have a boolean query with 300 terms that i append to each  
>>>> query i
>>>> make. I am afraid that it will be slow.
>>>>
>>>>     - I can use a ValueSourceQuery, where for each document i  
>>>> compute
>>>> a custom score based on the value of the collection field. Will  
>>>> it be
>>>> faster than the first solution?
>>>>
>>>>     - I can do advanced things such as writing a custom  
>>>> HitCollector,
>>>> or a custom Query.
>>>>
>>>>     - I can add another field to each document, containing a  
>>>> computed
>>>> custom score, then i could sort on that field. But i want to avoid
>>>> this solution at all costs, since it would mean re-indexing all the
>>>> documents each time the collection scores change.
>>>>
>>>>     What solution do you suggest?  Is there another solution that i
>>>> didnt mention?
>>>>
>>>>     More recent documents should also come first : In fact the  
>>>> final
>>>> sorting should be a ponderated sum between the collection score  
>>>> of an
>>>> image and the date of an image : most recent images from the
>>>> best-scored collections come first, then most recent from less- 
>>>> scrored
>>>> collections, then less recent from best scored, and so on. I would
>>>> also like to be able to adjust the balance between date/collection
>>>> score.
>>>>
>>>>     What solution do you suggest?
>>>>
>>>>
>>>>     I would also like to implement random-sorting. My solution  
>>>> is : i
>>>> create 12 new fields R1 -> R12 for each document, each containing a
>>>> random number between 1 and 12. To get a random sort, i sort  
>>>> each day
>>>> with a different combination of R1 .. R12. For example :
>>>>
>>>>     Day 1 : i sort by R1 then R4 then R5..
>>>>     Day 2 : i sort by R10 then R9 then R2....
>>>>     etc...
>>>>
>>>> Is it a good solution? Is there another way to do it?
>>>>
>>>>
>>>>     Very big thx in advance for your answers.
>>>>
>>>> Antoine
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message