lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: document boost not showing up in Explanation
Date Tue, 28 Dec 2004 10:05:51 GMT
On Tuesday 28 December 2004 08:37, Erik Hatcher wrote:
> 
> On Dec 27, 2004, at 9:54 PM, Vikas Gupta wrote:
> > I am using lucene-1.4.1.jar(with nutch). For some reason, the effect of
> > document boost is not showing up in the search results. Also, why is it
> > not a part of the Explanation
> 
> It actually is part of it....
> 
> >     Below is the 'explanation' of a sample query "solar". I don't see 
> > the
> > boost value (1.5514448) being used at all in the calculation of the
> > document score - from the 'explanation' below and also from the 
> > quality of
> > the search.
> >
> >     How can I see the effect of document boost?
> 
> Document boost is not stored in the index as-is.  A single 
> normalization factor is stored per-field and is computed at indexing 
> type using field and document boosts, as well as the length 
> normalization factor (and perhaps other factors I'm forgetting at the 
> moment?).

This also means that the explanation can only show a field normalisation
factor as it is available from the index.

One reason that boosting does necessarily not show up in the quality of
the search is that the byte encoding allows only 256 different values to
be stored.
The value stored in the index (called the norm) is the product of the
document boost factor, the field boost factor and the lengthNorm() of
the field.
For the search results to actually change because of the boost factors,
it is necessary that this stored factor is changed to another one of
the 256 possible.

The range of possible values stored in the index is roughly from
7x10^9 to 2x10^-9 . See:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/document/Field.html#setBoost(float)
and
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html#encodeNorm(float)

The range of stored values (excluding the zero special case) is
about 7x10^9 / 2x10^-9 = 3.5x10^18. The 10 log of that is about 18.5 .
Per factor 10 there are about 255/18.5 = 13.8 encoded values.
So, a minimum boost factor that should change a document
score is about  log(13.8)/log(10) = 1.14 .
Since the default lengthNorm is the square root, a field length
should change by at least the square of that (roughly a factor 1.3)
to change the document score (assuming no hits in 
the changed field text.)

Finally, a change in document score only influences the document
ordering in the search results when another document has a score
that is within the range of the change.

Regards,
Paul Elschot.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message