lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: Proposal: Scorer api change
Date Wed, 09 Jun 2010 12:40:49 GMT
So just to make sure I understand:

A Matcher is paired w/ a Scorer, and this pairing is done at Query
construction time ... e.g. if I use QP to construct the Query, I'd need to
extend QP by providing my custom scorer for relevant Matchers (and reuse the
scorers logic for the other fragments), and if I programmatically create a
Query, I'll need to pair its Matcher w/ a Scorer. Is that what you meant?

How is that different from today's API? At a high level, someone can extend
BQ and override createScorer .. if Scorer was just the Scorer and BQ had a
Matcher ...

BTW, re the note on BM25BQ -- do you think a BM25 Scorer can fit all query
types? I.e. would you reuse the same instance code for
Boolean/Term/Phrase/SpanQuery, or would you not need to write a proper BM25
scoring algorithm depending on the Query type? I'm asking this assuming we
have a Matcher and Scorer decoupling.

If you can indeed have one BM25 scoring algorithm that fits all Query types,
which means it's quite agnostic to the Query executed, and only cares about
the doc id, and maybe some independent data it can fetch about it from
elsewhere, then I agree that the current API is not nicely extensible. But
if not, then I don't see how would the Matcher/Scorer change improve that.

Perhaps we should describe 2-3 queries, the result query trees and how they
are evaluated today vs. the Matcher/Scorer approach? It's always easier to
talk about something when you have an example :)

Shai

On Wed, Jun 9, 2010 at 3:16 PM, Earwin Burrfoot <earwin@gmail.com> wrote:

> What I have in mind is basically having two parallel trees - one for
> matching, one for scoring.
> Matching tree is completely independent and can be used as a filter
> with sort-by-field approach, for example.
> Scoring tree nodes have references to corresponding matching tree
> nodes, so they can exploit their "current state".
>
> Both trees are built with a visitor over some AST produced from
> textual query, or programmatically.
> So what you have to do is to write said visitors. Some of the basic
> scorers can be reused by your custom visitor, so voila - we have nice
> extensibility by composition, instead of extensibility by inheritance
> (which sucks). Also, all this custom code is gathered in a single
> class, instead of being spread over your query derivatives.
> This is not a final design, lots of things can differ. I.e. - trees
> don't have to be parallel. If we want some query branch to not affect
> the score, but do matching, we're currently wrapping it in
> ConstantScoreQuery, in my design the matcher tree will look as is, but
> corresponding scorer tree branch will be replaced by ConstantScore.
>
> 2010/6/9 Shai Erera <serera@gmail.com>:
> > I don't feel comfortable with the statement "these visitors are then free
> to
> > specialize on matchers or not ...". Let's think how this API will be used
> ..
> > today, the user has two hooks - the QueryParser and Collector. Collector
> > allows you to plug in your own and by extending QP you can return your
> own
> > Query for different fragments.
> >
> > The Query is a full set though - Query + Weight + Scorer. Whether you
> extend
> > an existing query and just override one of the methods is up to you, but
> > still the Query is self contained.
> >
> > If we break the Query API down to a Matcher and Scorer, how will you
> provide
> > your own Scorer? Collector is independent of the Query - it just collects
> > the results. Will the Scorer be independent of Query too (and become an
> > IndexSearcher.search() argument)? I don't think so, 'cause you want to
> know
> > which Matcher you're up against in order to write a good Scorer. There's
> no
> > point passing in a PhraseScorer if the query does not include any
> > PhraseMatcher. So will you need to extend Query, to return your own
> custom
> > Scorer, for certain fragments? Can't you do it today already (given the
> API
> > is not final, is public/protected etc.)
> >
> > Earwin - is that what you had in mind? If so, let's think first if the
> > current API is not sufficient, given that we 'open' it for extension ...
> > e.g., can someone achieve that by extending PhraseQuery, override
> > createScorer and return his own? Do we need more than that?
> >
> > I'm not saying we should refactor the API to Matcher + Scorer, just
> thinking
> > on what do we really need to do and what's the best way to achieve that.
> >
> > Shai
> >
> > On Wed, Jun 9, 2010 at 2:24 PM, Earwin Burrfoot <earwin@gmail.com>
> wrote:
> >>
> >> > Can we represent the Query
> >> > state in some general structure, that no matter which Query you get,
> >> > you'll
> >> > know how to score it?
> >>
> >> No. You could go for unified interface that allows you to express
> >> different query states, like a set of untyped key-values, but you'll
> >> end up switching on these keyvalues in the end.
> >>
> >> It's better to define a set of matchers, and then produce visitors
> >> that compute scores. These visitors are then free to specialize on
> >> matchers or not, or ignore the whole tree completely.
> >>
> >> --
> >> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
> >> Phone: +7 (495) 683-567-4
> >> ICQ: 104465785
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >>
> >
> >
>
>
>
> --
> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
> Phone: +7 (495) 683-567-4
> ICQ: 104465785
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Mime
View raw message