lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Wang <john.w...@gmail.com>
Subject Re: Proposal: Scorer api change
Date Tue, 08 Jun 2010 15:41:51 GMT
Hi Shai:

    Similarity in many cases is not sufficient for scoring. For example, to
implement age decaying of a document (very useful for corpuses like news or
tweets), you want to project the raw tfidf score onto a time curve, say
f(x), to do this, you'd have a custom scorer that decorates the underlying
scorer from your say, boolean query:

public float score(){
    return myFunc(innerScorer.score());
}

    This is fine, but then you would have to do this as well:

public int nextDoc(){
   return innerScorer.nextDoc();
}

and also:

public int advance(int target){
   return innerScorer.advance();
}

     The difference here is that nextDoc and advance are called far more
times as score. And you are introducing an extra method call for them, which
is not insignificant for queries result in large recall sets.

Hope this makes sense.

Thanks

-John

On Tue, Jun 8, 2010 at 5:02 AM, Shai Erera <serera@gmail.com> wrote:

> I'm not sure I understand what you mean - Scorer is a DISI itself, and the
> scoring formula is mostly controlled by Similarity.
>
> What will be the benefits of the proposed change?
>
> Shai
>
>
> On Tue, Jun 8, 2010 at 8:25 AM, John Wang <john.wang@gmail.com> wrote:
>
>> Hi guys:
>>
>>     I'd like to make a proposal to change the Scorer class/api to the
>> following:
>>
>>
>> public abstract class Scorer{
>>    DocIdSetIterator getDocIDSetIterator();
>>    float score(int docid);
>> }
>>
>> Reasons:
>>
>> 1) To build a Scorer from an existing Scorer (e.g. that produces raw
>> scores from tfidf), one would decorate it, and it would introduce overhead
>> (in function calls) around nextDoc and advance, even if you just want to
>> augment the score method which is called much fewer times.
>>
>> 2) The current contract forces scoring on the currentDoc in the underlying
>> iterator. So once you pass "current", you can no longer score. In one of our
>> use-cases, it is very inconvenient.
>>
>> What do you think? I can go ahead and open an issue and work on a patch if
>> I get some agreement.
>>
>> Thanks
>>
>> -John
>>
>
>

Mime
View raw message