lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: search quality - assessment & improvements
Date Tue, 17 Jul 2007 01:24:31 GMT

: Basically it is
:   (1 - Slope) * Pivot + (Slope) * Doclen
: Where Pivot reflects on the average doc length, and
: Smaller Slope reduces the amount by which short docs
: are preferred over long ones. In collection with very

isn't that just a flat line with a slope relative to teh specified "Slope"
?  your pivot just seems to affect the y-intercept (which would be the
lengthNorm for field containing 0 terms) but doesn't that cancel out of
any scoring equation since the fieldNorm is multiplied in for all docs?

it seems like changing the pivot should affect the raw score values you
get back, but it doesn't seem like it would have much (if any) effect on
the relative scores of docs with differnet lengths

actaully, i must be missing something about your calculation...

: long documents, a doc shorter than the pivot would be
: rewarded, but that same doc would be rewarded relatively
: less in a collection with shorter docs. So how much you
: reward adapts to the specific collection characteristics,
: without knowing these characteristics in advance.

..from what i can tell, your function rewards longer documents without
bounds ... did you mean: 1/((1 - Slope) * Pivot + (Slope) * Doclen) ?

(that doesn't look right either)

: I think both are not good enough for large dynamic collections.
: Both are good enough for experiments. But it should be more
: efficient in a working dynamic large system.

Hmmm... perhaps what we need is a generalization of the pyaload API to
allow storing/reading payloads on a per document, per field, or per index
basis ... along with some sort of "PayloadMerger" that could be used by
InexWriter when merging segments get merged ... so you could write doc
length stats for each field when when closing a segmentwriter, and then
merge those stats when merging segments, and when reading segments do a
quick calculation to compute the average across all segments.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message