lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <DOR...@il.ibm.com>
Subject Re: search quality - assessment & improvements
Date Thu, 19 Jul 2007 19:28:31 GMT
> However ... i still think that if you realy want
> a length norm that takes into account the average
> length of the docs, you want one that rewards docs
> for being near the average ...

... like SweetSpotSimilarity (SSS)

> it doesn't seem to make a lot of sense to me to say
> that a doc whose length is N% longer longer then the
> average length is significantly worse the docs whose
> length is N% shorter then the average length.

I don't understand why a doc should be punished for
just having length different from the average length
(i.e. no matter longer or shorter).

The (evolving) way I understand it:
(a) Very long docs are likely to contain everything,
    let's punish them to relax this;
(b) This is what the original doc-length-norm
    actually does;
(c) But then very short docs might be
    rewarded too much;
(d) Now we might get stupid (or erroneous)
    few words docs as top results;
(e) To solve this, pivoted doc-length-norm punishes too
    long docs (longer than the average) but only slightly
    rewards docs that are shorter than the average.

It makes sense to me (IR'ishly if I may say so).
The SSS way does not make sense to me that way.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message