lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Several questions about scoring/sorting + random sorting in an image/related application
Date Fri, 15 Jun 2007 16:18:18 GMT
Another possibility is to re-think this a bit. You are "displaying
documents one page at a time", which I take to mean you
are displaying some number (say 50) document summaries
per page.

I'm also assuming that you want to display ALL documents
from, say, collection 32 and then (and only then) display
the documents in the next-ranking collection.

Let's further assume that no collection has less than 50 docs
for discussion purposes, but that's not a requirement.

At server startup, you compute the number of documents
in each bucket. When a user pages through results, you can
reasonably confine the search to just two or three of your
collections and easily add a boosted clause for those few
collections.

The tricky part of this would be to know that a request
for results 200-250 spanned across collections
100 and 400, and making sure that the page boundaries
were computed correctly.

Don't know if this is reasonable, but occurred to me...

Best
Erick




On 6/15/07, Antoine Baudoux <ab@taktik.be> wrote:
>
> The problem is that i want lucene to do the sorting, because the
> query qould return thousands of results, and I'm displaying documents
> one page at a time.
> --
> Antoine Baudoux
> Development Manager
> ab@taktik.be
> Tél.: +32 2 333 58 44
> GSM: +32 499 534 538
> Fax.: +32 2 648 16 53
>
>
> On 15 Jun 2007, at 17:42, Mathieu Lecarme wrote:
>
> > First step is to feed a Set with "collection"
> > Second step is to sort it.
> >
> > With a sortedSet, you can do that, isnt'it?
> >
> > M.
> >
> >
> > Antoine Baudoux a écrit :
> >> Could-you be more precise? I dont understand what you mean.
> >>
> >>
> >>
> >> On 15 Jun 2007, at 17:20, Mathieu Lecarme wrote:
> >>
> >>> Your request seems to be a two steps query.
> >>> First step, you select image, and then collection
> >>> Second step, you sort collection.
> >>>
> >>> BitVector can help you?
> >>>
> >>> M.
> >>> Antoine Baudoux a écrit :
> >>>>     Hi,
> >>>>
> >>>>     I'm developping an image database. Each lucene document
> >>>> representing an image contains (among other fields ):
> >>>>
> >>>>     - a date field
> >>>>     - a collection field containing the ID of the collection the
> >>>> image
> >>>> belongs to.
> >>>>
> >>>>     I want to be able to give a score to each collection.
> >>>> Collections
> >>>> with a higher score appear first in the results. I want to avoid
> >>>> re-indexing all the documents each time i change my collection
> >>>> scores.
> >>>>
> >>>>     For example on day 1 I decide to give collection #1 a 5
> >>>> score and
> >>>> collection #3 a 10 score --> images belonging to collection #3
> >>>> appear
> >>>> first in search results.
> >>>>     One day 2 i give collection #3 a 2 score --> images
> >>>> belonging to
> >>>> collection #1 appear first in search results.
> >>>>
> >>>>     I have read the lucene docs, and from what i understand
> >>>> there are
> >>>> many ways to achieve what I want :
> >>>>
> >>>>
> >>>>     - I can use a Very big Boolean query (OR query in fact)
> >>>> containing
> >>>> one TermQuery per collection ID, setting the correct boost
> >>>> factor for
> >>>> each termquery. The problem with this is that i have 300
> >>>> collections,
> >>>> so i have a boolean query with 300 terms that i append to each
> >>>> query i
> >>>> make. I am afraid that it will be slow.
> >>>>
> >>>>     - I can use a ValueSourceQuery, where for each document i
> >>>> compute
> >>>> a custom score based on the value of the collection field. Will
> >>>> it be
> >>>> faster than the first solution?
> >>>>
> >>>>     - I can do advanced things such as writing a custom
> >>>> HitCollector,
> >>>> or a custom Query.
> >>>>
> >>>>     - I can add another field to each document, containing a
> >>>> computed
> >>>> custom score, then i could sort on that field. But i want to avoid
> >>>> this solution at all costs, since it would mean re-indexing all the
> >>>> documents each time the collection scores change.
> >>>>
> >>>>     What solution do you suggest?  Is there another solution that i
> >>>> didnt mention?
> >>>>
> >>>>     More recent documents should also come first : In fact the
> >>>> final
> >>>> sorting should be a ponderated sum between the collection score
> >>>> of an
> >>>> image and the date of an image : most recent images from the
> >>>> best-scored collections come first, then most recent from less-
> >>>> scrored
> >>>> collections, then less recent from best scored, and so on. I would
> >>>> also like to be able to adjust the balance between date/collection
> >>>> score.
> >>>>
> >>>>     What solution do you suggest?
> >>>>
> >>>>
> >>>>     I would also like to implement random-sorting. My solution
> >>>> is : i
> >>>> create 12 new fields R1 -> R12 for each document, each containing
a
> >>>> random number between 1 and 12. To get a random sort, i sort
> >>>> each day
> >>>> with a different combination of R1 .. R12. For example :
> >>>>
> >>>>     Day 1 : i sort by R1 then R4 then R5..
> >>>>     Day 2 : i sort by R10 then R9 then R2....
> >>>>     etc...
> >>>>
> >>>> Is it a good solution? Is there another way to do it?
> >>>>
> >>>>
> >>>>     Very big thx in advance for your answers.
> >>>>
> >>>> Antoine
> >>>>
> >>>> -------------------------------------------------------------------
> >>>> --
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>>
> >>>
> >>>
> >>> --------------------------------------------------------------------
> >>> -
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>>
> >>
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message