lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: Proposal: Scorer api change
Date Tue, 08 Jun 2010 17:20:36 GMT
Yeah I got what he meant, but I honestly don't think those delegate calls
are an overhead ...

Shai

On Tue, Jun 8, 2010 at 8:12 PM, Earwin Burrfoot <earwin@gmail.com> wrote:

> Shai, his wrapper Scorer will just look like:
> DISI getDISI() {
>  return delegate.getDISI();
> }
>
> float score(int doc) {
>  return calcMyAwesomeScore(doc);
> }
>
> this saves delegate.nextDoc(), delegate.advance() indirection calls.
> But I already offered a better alternative :)
>
> On Tue, Jun 8, 2010 at 21:09, Shai Erera <serera@gmail.com> wrote:
> > I guess I must be missing something fundamental here :).
> >
> > If Scorer is defined as you propose, and I create my Scorer which impls
> > getDISI() as "return this" - what do I lose? What's wrong w/ Scorer
> already
> > being a DISI?
> >
> > You mention "it is just inefficient to pay the method call overhead ..."
> -
> > what overhead? Are you talking about the decorator delegating the call to
> > the wrapped scorer? I really think the compiler can handle that, no?
> > Especially if you make your nextDoc/advance final (which probably you
> > should) ...
> >
> > That doesn't seem to justify an API change, break bw completely (even if
> we
> > do it in 4.0 only) and change all the current Scorers ...
> >
> > Shai
> >
> > On Tue, Jun 8, 2010 at 8:01 PM, John Wang <john.wang@gmail.com> wrote:
> >>
> >> re: But Scorer is itself an iterator, so what prevents you from calling
> >> nextDoc and advance on it without score()
> >>
> >> Nothing. It is just inefficient to pay the method call overhead just to
> >> overload score.
> >>
> >> re: If I were in your shoes, I'd simply provider a Query wrapper. If CSQ
> >> is not good enough I'd just develop my own.
> >>
> >> That is what I am doing. I am just proposing the change (see my first
> >> email) as an improvement.
> >>
> >> re: Scorer is itself an iterator
> >>
> >> yes, that is the current definition. The point of the proposal is to
> make
> >> this change.
> >>
> >> -John
> >>
> >> On Tue, Jun 8, 2010 at 9:45 AM, Shai Erera <serera@gmail.com> wrote:
> >>>
> >>> Well ... I don't know the reason as well and always thought Scorer and
> >>> Similarity are confusing.
> >>>
> >>> But Scorer is itself an iterator, so what prevents you from calling
> >>> nextDoc and advance on it without score(). And what would the returned
> >>> DISI do when nextDoc is called, if not delegate to its subs?
> >>>
> >>> If I were in your shoes, I'd simply provider a Query wrapper. If CSQ
> >>> is not good enough I'd just develop my own.
> >>>
> >>> But perhaps others think differently?
> >>>
> >>> Shai
> >>>
> >>> On Tuesday, June 8, 2010, John Wang <john.wang@gmail.com> wrote:
> >>> > Hi Shai:
> >>> >     I am not sure I understand how changing Similarity would solve
> this
> >>> > problem, wouldn't you need the reader?
> >>> >     As for PayloadTermQuery, payload is not always the most efficient
> >>> > way of storing such data, especially when number of terms <<
numdocs.
> (I am
> >>> > not sure accessing the payload when you iterate is a good idea, but
> that is
> >>> > another discussion)
> >>> >
> >>> >     Yes, what I described is exactly a simple CustomScoreQuery for
a
> >>> > special use-case. The problem is also in CustomScoreQuery, where
> nextDoc and
> >>> > advance are calling the sub-scorers as a wrapper. This can be avoided
> if the
> >>> > Scorer returns an iterator instead.
> >>> >
> >>> >     Separating scoring and doc iteration is a good idea anyway. I
> don't
> >>> > know the reason to combine them originally.
> >>> > Thanks
> >>> > -John
> >>> >
> >>> >
> >>> > On Tue, Jun 8, 2010 at 8:47 AM, Shai Erera <serera@gmail.com>
wrote:
> >>> >
> >>> > So wouldn't it make sense to add some method to Similarity? Which
> >>> > receives the doc Id in question maybe ... just thinking here.
> >>> >
> >>> > Factoring Scorer like you propose would create 3 objects for
> >>> > scoring/iterating: Scorer (which really becomes an iterator),
> Similarity and
> >>> > CustomScoreFunction ...
> >>> >
> >>> > Maybe you can use CustomScoreQuery? or PayloadTermQuery? depends how
> >>> > you compute your age decay function (where you pull the data about
> the age
> >>> > of the document).
> >>> >
> >>> > Shai
> >>> >
> >>> >
> >>> > On Tue, Jun 8, 2010 at 6:41 PM, John Wang <john.wang@gmail.com>
> wrote:
> >>> > Hi Shai:
> >>> >     Similarity in many cases is not sufficient for scoring. For
> >>> > example, to implement age decaying of a document (very useful for
> corpuses
> >>> > like news or tweets), you want to project the raw tfidf score onto
a
> time
> >>> > curve, say f(x), to do this, you'd have a custom scorer that
> decorates the
> >>> > underlying scorer from your say, boolean query:
> >>> >
> >>> >
> >>> >
> >>> > public float score(){    return myFunc(innerScorer.score());}
> >>> >     This is fine, but then you would have to do this as well:
> >>> > public int nextDoc(){
> >>> >
> >>> >
> >>> >    return innerScorer.nextDoc();}
> >>> > and also:
> >>> > public int advance(int target){   return innerScorer.advance();}
> >>> > The difference here is that nextDoc and advance are called far more
> times as
> >>> > score. And you are introducing an extra method call for them, which
> is not
> >>> > insignificant for queries result in large recall sets.
> >>> >
> >>> >
> >>> >
> >>> > Hope this makes sense.
> >>> > Thanks
> >>> > -John
> >>> > On Tue, Jun 8, 2010 at 5:02 AM, Shai Erera <serera@gmail.com>
wrote:
> >>> > I'm not sure I understand what you mean - Scorer is a DISI itself,
> and
> >>> > the scoring formula is mostly controlled by Similarity.
> >>> >
> >>> > What will be the benefits of the proposed change?
> >>> >
> >>> > Shai
> >>> >
> >>> > On Tue, Jun 8, 2010 at 8:25 AM, John Wang <john.wang@gmail.com>
> wrote:
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > Hi guys:
> >>> >
> >>> >     I'd like to make a proposal to change the Scorer class/api to the
> >>> > following:
> >>> >
> >>> >
> >>> > public abstract class Scorer{
> >>> >    DocIdSetIterator getDocIDSetIterator();
> >>> >    float score(int docid);
> >>> > }
> >>> >
> >>> > Reasons:
> >>> >
> >>> > 1) To build a Scorer from an existing Scorer (e.g. that produces raw
> >>> > scores from tfidf), one would decorate it, and it would introduce
> overhead
> >>> > (in function calls) around nextDoc and advance, even if you just want
> to
> >>> > augment the score method which is called much fewer times.
> >>> >
> >>> > 2) The current contract forces scoring on the currentDoc in the
> >>> > underlying iterator. So once you pass "current", you can no longer
> score. In
> >>> > one of our use-cases, it is very inconvenient.
> >>> >
> >>> > What do you think? I can go ahead and open an issue and work on a
> patch
> >>> > if I get some agreement.
> >>> >
> >>> > Thanks
> >>> >
> >>> > -John
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: dev-help@lucene.apache.org
> >>>
> >>
> >
> >
>
>
>
> --
> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
> Phone: +7 (495) 683-567-4
> ICQ: 104465785
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Mime
View raw message