incubator-lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: [lucy-user] $boost importance in weighting
Date Thu, 01 Dec 2011 17:35:41 GMT
On Thu, Dec 01, 2011 at 12:07:47PM +0200, goran kent wrote:
> The page at http://incubator.apache.org/lucy/docs/perl/Lucy/Plan/FieldType.html
> is a bit sparse on detail about the boost property.
 
> I'd like to get a better understanding of how and by how much it's
> value influences score (rank) in search results - what's the formula
> used when boost is applied to a document's score?

It's pretty complicated.  Field boost, document boost, and field length
normalization are all consolidated, then they are reduced down to a single
8-bit float with a 3-bit mantissa and a 5-bit exponent.  Because of the
coarseness of the lossy data compression, small changes to boost may not even
move the needle.

I wouldn't bother with a field or document boost multiplier that doesn't
change things by at least a factor of 2.

It's theoretically possible to calculate ceiling and floor values for boost,
but I don't know what the answers are.
 
> Finally, what are reasonable values (upper/lower) for boost when, in
> my case eg, I'd like to influence the score based on an external value
> (page rank), but not have my page rank completely skew the scores -
> just enough to promote pages which have an organic page rank value
> which should be considered to some degree (a very broad subject, I
> know).
 
Subtle rerankings are problematic because search engines are noisy.  Even the
best ones give you a bunch of junk you don't need.  We don't really care about
fine distinctions, because if you sample a handful of documents with identical
scores, odds are that they are *wildly* divergent in terms of what the user
wants.  We only care about big differences.

> My tests so far show that a boost value with a small variance in the
> mantissa has an almost zero influence on score/ranking.  My thinking
> is to boost with something akin to $boost+=LogN(PR) - ie between 0-10
> (log scale).  So this boils down to:  is using a scale of 1-10 a good
> idea w.r.t. the Lucy boost property to influence ranking, or 10x that
> value?

I'd try 1-100.  If that's too much, scale it back.

Marvin Humphrey



Mime
View raw message