lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Whither Query Norm?
Date Fri, 20 Nov 2009 21:59:01 GMT

On Nov 20, 2009, at 1:24 PM, Jake Mannix wrote:

> On Fri, Nov 20, 2009 at 10:08 AM, Grant Ingersoll <> wrote:
>> I should add in my $0.02 on whether to just get rid of queryNorm() altogether: 
>>   -1 from me, even though it's confusing, because having that call there (somewhere,
at least) allows you to actually do compare scores across queries if you do the extra work
of properly normalizing the documents as well (at index time).
> Do you have some references on this?  I'm interested in reading more on the subject.
 I've never quite been sold on how it is meaningful to compare scores and would like to read
more opinions.
> References on how people do this *with Lucene*, or just how this is done in general?

in general.  Academic references, etc.

> There are lots of papers on fancy things which can be done, but I'm not sure where to
point you to start out.  The technique I'm referring to is really just the simplest possible
thing beyond setting your weights "by hand": let's assume you have a boolean OR query, Q,
built up out of sub-queries q_i (hitting, for starters, different fields, although you can
overlap as well with some more work), each with a set of weights (boosts) b_i, then if you
have a training corpus (good matches, bad matches, or ranked lists of matches in order of
relevance for the queries at hand), *and* scores (at the q_i level) which are comparable,
then you can do a simple regression (linear or logistic, depending on whether you map your
final scores to a logit or not) on the w_i to fit for the best boosts to use.  What is critical
here is that scores from different queries are comparable.  If they're not, then queries where
the best document for a query scores 2.0 overly affect the training in comparison to the queries
where the best possible score is 0.5 (actually, wait, it's the reverse: you're training to
increase scores of matching documents, so the system tries to make that 0.5 scoring document
score much higher by raising boosts higher and higher, while the good matches already scoring
2.0 don't need any more boosting, if that makes sense).

This makes sense from a mathematical sense, assuming scores are comparable.  What I would
like to get at is why anyone thinks scores are comparable across queries to begin with.  I
agree it is beneficial in some cases (as you described) if they are.   Probably a question
suited for an academic IR list...

> There are of course far more complex "state of the art" training techniques, but probably
someone like Ted would be able to give a better list of references on where is easiest to
read those from.  But I can try to dredge up some places where I've read about doing this,
and post again later if I can find any.

View raw message