lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vikas Gupta <vgu...@cs.utexas.edu>
Subject Re: document boost not showing up in Explanation
Date Thu, 30 Dec 2004 02:12:31 GMT
Thanks a lot Erik and Paul for the great explanation. I understand the
issues now.

I also found that the relevant code is in (or is called from) the function
DocumentWriter::addDocument().

On Tue, 28 Dec 2004, Paul Elschot wrote:

> On Tuesday 28 December 2004 08:37, Erik Hatcher wrote:
> >
> > On Dec 27, 2004, at 9:54 PM, Vikas Gupta wrote:
> > > I am using lucene-1.4.1.jar(with nutch). For some reason, the effect of
> > > document boost is not showing up in the search results. Also, why is it
> > > not a part of the Explanation
> >
> > It actually is part of it....
> >
> > >     Below is the 'explanation' of a sample query "solar". I don't see
> > > the
> > > boost value (1.5514448) being used at all in the calculation of the
> > > document score - from the 'explanation' below and also from the
> > > quality of
> > > the search.
> > >
> > >     How can I see the effect of document boost?
> >
> > Document boost is not stored in the index as-is.  A single
> > normalization factor is stored per-field and is computed at indexing
> > type using field and document boosts, as well as the length
> > normalization factor (and perhaps other factors I'm forgetting at the
> > moment?).
>
> This also means that the explanation can only show a field normalisation
> factor as it is available from the index.
>
> One reason that boosting does necessarily not show up in the quality of
> the search is that the byte encoding allows only 256 different values to
> be stored.
> The value stored in the index (called the norm) is the product of the
> document boost factor, the field boost factor and the lengthNorm() of
> the field.
> For the search results to actually change because of the boost factors,
> it is necessary that this stored factor is changed to another one of
> the 256 possible.
>
> The range of possible values stored in the index is roughly from
> 7x10^9 to 2x10^-9 . See:
> http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/document/Field.html#setBoost(float)
> and
> http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html#encodeNorm(float)
>
> The range of stored values (excluding the zero special case) is
> about 7x10^9 / 2x10^-9 = 3.5x10^18. The 10 log of that is about 18.5 .
> Per factor 10 there are about 255/18.5 = 13.8 encoded values.
> So, a minimum boost factor that should change a document
> score is about  log(13.8)/log(10) = 1.14 .
> Since the default lengthNorm is the square root, a field length
> should change by at least the square of that (roughly a factor 1.3)
> to change the document score (assuming no hits in
> the changed field text.)
>
> Finally, a change in document score only influences the document
> ordering in the search results when another document has a score
> that is within the range of the change.
>
> Regards,
> Paul Elschot.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message