lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mathieu Lecarme <>
Subject Re: Several questions about scoring/sorting + random sorting in an image/related application
Date Fri, 15 Jun 2007 15:20:53 GMT
Your request seems to be a two steps query.
First step, you select image, and then collection
Second step, you sort collection.

BitVector can help you?

Antoine Baudoux a écrit :
>     Hi,
>     I'm developping an image database. Each lucene document
> representing an image contains (among other fields ):
>     - a date field
>     - a collection field containing the ID of the collection the image
> belongs to.
>     I want to be able to give a score to each collection. Collections
> with a higher score appear first in the results. I want to avoid
> re-indexing all the documents each time i change my collection scores.
>     For example on day 1 I decide to give collection #1 a 5 score and
> collection #3 a 10 score --> images belonging to collection #3 appear
> first in search results.
>     One day 2 i give collection #3 a 2 score --> images belonging to
> collection #1 appear first in search results.
>     I have read the lucene docs, and from what i understand there are
> many ways to achieve what I want :
>     - I can use a Very big Boolean query (OR query in fact) containing
> one TermQuery per collection ID, setting the correct boost factor for
> each termquery. The problem with this is that i have 300 collections,
> so i have a boolean query with 300 terms that i append to each query i
> make. I am afraid that it will be slow.
>     - I can use a ValueSourceQuery, where for each document i compute
> a custom score based on the value of the collection field. Will it be
> faster than the first solution?
>     - I can do advanced things such as writing a custom HitCollector,
> or a custom Query.
>     - I can add another field to each document, containing a computed
> custom score, then i could sort on that field. But i want to avoid
> this solution at all costs, since it would mean re-indexing all the
> documents each time the collection scores change.
>     What solution do you suggest?  Is there another solution that i
> didnt mention?
>     More recent documents should also come first : In fact the final
> sorting should be a ponderated sum between the collection score of an
> image and the date of an image : most recent images from the
> best-scored collections come first, then most recent from less-scrored
> collections, then less recent from best scored, and so on. I would
> also like to be able to adjust the balance between date/collection score.
>     What solution do you suggest?
>     I would also like to implement random-sorting. My solution is : i
> create 12 new fields R1 -> R12 for each document, each containing a
> random number between 1 and 12. To get a random sort, i sort each day
> with a different combination of R1 .. R12. For example :
>     Day 1 : i sort by R1 then R4 then R5..
>     Day 2 : i sort by R10 then R9 then R2....
>     etc...
> Is it a good solution? Is there another way to do it?
>     Very big thx in advance for your answers.
> Antoine
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message