lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Usefulness of Similarity.queryNorm()
Date Wed, 13 Feb 2008 08:24:21 GMT

: It's the *same* coefficient for all sub-clauses, so it shouldn't affect
: rankings, BUT...  relative rankings *will* be affected is some inner clauses
: have custom boost values.

For things like ConstantScoreQuery and BoostingQuery or any user created 
Query classes that try to return specific, meaningful, score values 
queryNorm may not make sense.

According to Doug on Jan-12 2006...

>> The tf(), idf(), lengthNorm() and queryNorm() are directly from the 
>> cosine measure, although lengthNorm()'s default implemenation uses an  
>> approximation. that's why it was put it in.  is it useful?  "eh" ... it seems to 
be.  in doing some email searches to try and jog my memory on this topic, 
i found the comment below, aparently made by me....

>> As i recall, a more practical purpose for the queryNorm is that when
>> dealing with large complex query structures consisting of "container"
>> queries (BooleanQueries, DisjunctionMaxQueries, SpanNearQueries, 
>> etc...) the queryNorm is applied to the the "leaf"  queries as the 
>> computation proceeds, which helps keep the scores from getting 
>> unmanagably large (and loosing precision) as they are aggregated up. 
>> when dealing with floats, where 0<n<1 ...
>>   A*n + B*n + C*n + ... Z*n
>> ...results in  more "precise" calculation then...
>>   (A + B C + ... + Z)*n
>> ...correct? 

...but that might be a load of crap, no one smarter then me ever verified 
that, and i can't find where i got the impression that that was actually 
one of the reasons for queryNorm.

: It seems to me, conceptually, like code that claims to perform "normalization"
: shouldn't be able to affect rankings.  However, because of this side effect of
: incorporating boost at the normalization stage, it can.

I'm not following your agrument ... i can see how a non-standard Query 
class might cause some interesting things to happen if you change the 
queryNorm() impl, but i cna't think of any way that rankings of results 
from BooleanQueries o TermQueries and PhraseQueries (etc.) would change if 
you used a different function.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message