lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Keegan" <peterlkee...@gmail.com>
Subject Re: Sorting by Score
Date Wed, 28 Feb 2007 20:06:21 GMT
Erich,

Yes, this seems to be the simplest way to implement score 'bucketization',
but wouldn't it be more efficient to do this with a custom ScoreComparator?
That way, you'd do the bucketizing and sorting in one 'step' (compare()).
Maybe the savings isn't measurable, though. A comparator might also allow
one to do a more sophisticated rounding or bucketizing since you'd be
getting 2 scores at a time.

Peter


On 2/28/07, Erick Erickson <erickerickson@gmail.com> wrote:
>
> Empirically, when I insert the elements in the FieldSortedHitQueue
> they get sorted according to the Sort object. The original query
> that gives me a TopDocs applied
> no secondary sorting, only relevancy. Since I normalized
> all the scores into one of only 5 discrete values, and secondary
> sorting was applied to all docs with the same score when I inserted
> them in the FieldSortedHitQueue.
>
> Now popping things of the FieldSortedHitQueue is ordered the
> way I want.
>
> You could just operate on the FieldSortedHitQueue at this point, but
> I decided the rest of my code would be simpler if I stuffed them back
> into the TopDocs, so there's some explanation below that you can
> just skip if I've cleared things up already.....
>
> *****************
> The step I left out is moving the documents from the
> FIeldSortedHitQueue back to topDocs.scoreDocs.
> So the steps are as follows..
>
> 1> "bucketize" the scores. That is, go through the
> TopDocs.scoreDocs and adjust each raw score into
> one of my buckets. This is made easy by the
> existence of topDocs.getMaxScore. TopDocs has
> had no sorting other than relevancy applied so far.
>
> 2> assemble the FieldSortedHitQueue by inserting
> each element from scoreDocs into it, with a suitable
> Sort object, relevance is the first field (SortField.FIELD_SCORE).
>
> 3> pop the entries off the FieldSortedHitQueue, overwriting
> the elements in topDocs.scoreDocs.
>
> I left out step <3>, although I suppose you could
> operate directly on the FieldSortedHitQueue.
>
> NOTE: in my case, I just put everything back in the
> scoreDocs without attempting any efficiencies. If I
> needed more performance, I'd only put as many items
> back as I needed to display. But as I wrote yesterday,
> performance isn't an issue so there's no point. Although
> I know one place to look if we need to squeeze more QPS.
>
> How efficient this is is an open question. But it's "fast enough"
> and relatively simple so I stopped looking for more
> efficiencies....
>
> Erick
>
> On 2/28/07, Chris Hostetter <hossman_lucene@fucit.org> wrote:
> >
> >
> > : The first part was just to iterate through the TopDocs that's
> available
> > to
> > : my and normalize the scores right in the ScoreDocs. Like this...
> >
> > Won't that be done after the Lucene does the hitcollecting/sorting? ...
> he
> > wants the "bucketing" to happen as part of hte scoring so that the
> > secondary sort will determine the ordering within the bucket.
> >
> > (or am i missing something about your description?)
> >
> >
> >
> >
> > -Hoss
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message