lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: search quality - assessment & improvements
Date Fri, 20 Jul 2007 02:41:40 GMT

: (d) Now we might get stupid (or erroneous)
:     few words docs as top results;
: (e) To solve this, pivoted doc-length-norm punishes too
:     long docs (longer than the average) but only slightly
:     rewards docs that are shorter than the average.

I get that your calculation is much more gradual then the 1/sqrt(length)
so extremeley short docs are "only slightly" rewarded over average length
docs ... i'm just not not clear on why you wnat to reward supper short
docs at all.

Going back to SSS as an example, did you consider using a sweetspot that
went from 0 to your pivot (so all docs with legnth less then or equal to
the pivot/average length get an equal length boost) ?

...that's actually what i started with when i first wrote SSS, but then i
realized that in the case of really rare words (where the highest tf of
all docs is just 1) the tf was the only discriminating factor in the
scores of the various documents -- so it didn't matter if the norm for a 3
word doc wsa only slightly higher then (or equal to) that of an average
length doc -- the 3 word doc would get a higher (or equal) score.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message