lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: search quality - assessment & improvements
Date Wed, 18 Jul 2007 23:39:02 GMT

: Yes, actually:  1 / sqrt((1 - Slope) * Pivot + (Slope) * Doclen)

interesting ... it doesn't really seem like there is any direct
relationship between your average length (Pivot) and your Doclen --
on the surface when i first read your example it seemed like it has more
to do with the shifting of the curve then any intrinsic property of the
docs themselves and how their lengths related to the pivot.

in my mind the key question is how the length norms of docs are afected
when they are equal distant from the pivot (one high one low) ... in
theory you want the relative differnece in length norm to be the same
regardless of what the average length (ie: if the pivot is 100 the
lengthNorm ratio of a 90 word doc vs 110 word doc should be the same
as between a 900 word doc and a 1100 word doc if the pivot is 1000 right
.. and once you actually do the path, this equation seems to satisfy it.
(which really confused me for about 10 minutes, but i'll go with it)

However ... i still think that if you realy want a length norm that takes
into account the average length of the docs, you want one that rewards
docs for being near the average ... it doesn't seem to make a lot of sense
to me to say that a doc whose length is N% longer longer then the
average length is significantly worse the docs whose length is N% shorter
then the average length.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message