lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Earwin Burrfoot <ear...@gmail.com>
Subject Re: Proposal: Scorer api change
Date Tue, 08 Jun 2010 17:12:44 GMT
Shai, his wrapper Scorer will just look like:
DISI getDISI() {
  return delegate.getDISI();
}

float score(int doc) {
  return calcMyAwesomeScore(doc);
}

this saves delegate.nextDoc(), delegate.advance() indirection calls.
But I already offered a better alternative :)

On Tue, Jun 8, 2010 at 21:09, Shai Erera <serera@gmail.com> wrote:
> I guess I must be missing something fundamental here :).
>
> If Scorer is defined as you propose, and I create my Scorer which impls
> getDISI() as "return this" - what do I lose? What's wrong w/ Scorer already
> being a DISI?
>
> You mention "it is just inefficient to pay the method call overhead ..." -
> what overhead? Are you talking about the decorator delegating the call to
> the wrapped scorer? I really think the compiler can handle that, no?
> Especially if you make your nextDoc/advance final (which probably you
> should) ...
>
> That doesn't seem to justify an API change, break bw completely (even if we
> do it in 4.0 only) and change all the current Scorers ...
>
> Shai
>
> On Tue, Jun 8, 2010 at 8:01 PM, John Wang <john.wang@gmail.com> wrote:
>>
>> re: But Scorer is itself an iterator, so what prevents you from calling
>> nextDoc and advance on it without score()
>>
>> Nothing. It is just inefficient to pay the method call overhead just to
>> overload score.
>>
>> re: If I were in your shoes, I'd simply provider a Query wrapper. If CSQ
>> is not good enough I'd just develop my own.
>>
>> That is what I am doing. I am just proposing the change (see my first
>> email) as an improvement.
>>
>> re: Scorer is itself an iterator
>>
>> yes, that is the current definition. The point of the proposal is to make
>> this change.
>>
>> -John
>>
>> On Tue, Jun 8, 2010 at 9:45 AM, Shai Erera <serera@gmail.com> wrote:
>>>
>>> Well … I don't know the reason as well and always thought Scorer and
>>> Similarity are confusing.
>>>
>>> But Scorer is itself an iterator, so what prevents you from calling
>>> nextDoc and advance on it without score(). And what would the returned
>>> DISI do when nextDoc is called, if not delegate to its subs?
>>>
>>> If I were in your shoes, I'd simply provider a Query wrapper. If CSQ
>>> is not good enough I'd just develop my own.
>>>
>>> But perhaps others think differently?
>>>
>>> Shai
>>>
>>> On Tuesday, June 8, 2010, John Wang <john.wang@gmail.com> wrote:
>>> > Hi Shai:
>>> >     I am not sure I understand how changing Similarity would solve this
>>> > problem, wouldn't you need the reader?
>>> >     As for PayloadTermQuery, payload is not always the most efficient
>>> > way of storing such data, especially when number of terms << numdocs.
(I am
>>> > not sure accessing the payload when you iterate is a good idea, but that
is
>>> > another discussion)
>>> >
>>> >     Yes, what I described is exactly a simple CustomScoreQuery for a
>>> > special use-case. The problem is also in CustomScoreQuery, where nextDoc
and
>>> > advance are calling the sub-scorers as a wrapper. This can be avoided if
the
>>> > Scorer returns an iterator instead.
>>> >
>>> >     Separating scoring and doc iteration is a good idea anyway. I don't
>>> > know the reason to combine them originally.
>>> > Thanks
>>> > -John
>>> >
>>> >
>>> > On Tue, Jun 8, 2010 at 8:47 AM, Shai Erera <serera@gmail.com> wrote:
>>> >
>>> > So wouldn't it make sense to add some method to Similarity? Which
>>> > receives the doc Id in question maybe ... just thinking here.
>>> >
>>> > Factoring Scorer like you propose would create 3 objects for
>>> > scoring/iterating: Scorer (which really becomes an iterator), Similarity
and
>>> > CustomScoreFunction ...
>>> >
>>> > Maybe you can use CustomScoreQuery? or PayloadTermQuery? depends how
>>> > you compute your age decay function (where you pull the data about the age
>>> > of the document).
>>> >
>>> > Shai
>>> >
>>> >
>>> > On Tue, Jun 8, 2010 at 6:41 PM, John Wang <john.wang@gmail.com> wrote:
>>> > Hi Shai:
>>> >     Similarity in many cases is not sufficient for scoring. For
>>> > example, to implement age decaying of a document (very useful for corpuses
>>> > like news or tweets), you want to project the raw tfidf score onto a time
>>> > curve, say f(x), to do this, you'd have a custom scorer that decorates the
>>> > underlying scorer from your say, boolean query:
>>> >
>>> >
>>> >
>>> > public float score(){    return myFunc(innerScorer.score());}
>>> >     This is fine, but then you would have to do this as well:
>>> > public int nextDoc(){
>>> >
>>> >
>>> >    return innerScorer.nextDoc();}
>>> > and also:
>>> > public int advance(int target){   return innerScorer.advance();}
>>> > The difference here is that nextDoc and advance are called far more times
as
>>> > score. And you are introducing an extra method call for them, which is not
>>> > insignificant for queries result in large recall sets.
>>> >
>>> >
>>> >
>>> > Hope this makes sense.
>>> > Thanks
>>> > -John
>>> > On Tue, Jun 8, 2010 at 5:02 AM, Shai Erera <serera@gmail.com> wrote:
>>> > I'm not sure I understand what you mean - Scorer is a DISI itself, and
>>> > the scoring formula is mostly controlled by Similarity.
>>> >
>>> > What will be the benefits of the proposed change?
>>> >
>>> > Shai
>>> >
>>> > On Tue, Jun 8, 2010 at 8:25 AM, John Wang <john.wang@gmail.com> wrote:
>>> >
>>> >
>>> >
>>> >
>>> > Hi guys:
>>> >
>>> >     I'd like to make a proposal to change the Scorer class/api to the
>>> > following:
>>> >
>>> >
>>> > public abstract class Scorer{
>>> >    DocIdSetIterator getDocIDSetIterator();
>>> >    float score(int docid);
>>> > }
>>> >
>>> > Reasons:
>>> >
>>> > 1) To build a Scorer from an existing Scorer (e.g. that produces raw
>>> > scores from tfidf), one would decorate it, and it would introduce overhead
>>> > (in function calls) around nextDoc and advance, even if you just want to
>>> > augment the score method which is called much fewer times.
>>> >
>>> > 2) The current contract forces scoring on the currentDoc in the
>>> > underlying iterator. So once you pass "current", you can no longer score.
In
>>> > one of our use-cases, it is very inconvenient.
>>> >
>>> > What do you think? I can go ahead and open an issue and work on a patch
>>> > if I get some agreement.
>>> >
>>> > Thanks
>>> >
>>> > -John
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Phone: +7 (495) 683-567-4
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message