lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Goller <>
Subject Re: About Hit Scoring
Date Sun, 31 Oct 2004 16:54:46 GMT
Chuck Williams schrieb:
> That's an interesting point that helps to better analyze the situation.
> It seems to me the units are arbitrary and so the distance in this case
> is not very meaningful.  I don't believe Lucene actually uses the
> document vector -- it uses the orthogonal projection of the document
> vector into the hyperspace of query terms, since it only considers
> document vector terms corresponding to query vector terms.  

For the distance of a document vector to the query-hyperplane, the
other directions of the document vector are irrelevant.

> The distance
> from the tip of the projected document vector to the hyperplane
> orthogonal to the query vector (within the query hyperspace) does not
> seem that meaningful, even if the units were clear and natural.
> Document vectors at different angles and arbitrarily large distances
> from the query vector can have the same length to this plane.

The term frequency is normalized by the field length and furthermore
there is still idf that comes in. So the units do at least have some

 > From a practical standpoint, I still think it is important to have
 > meaningful normalized final scores so that applications can interpret
 > these scores, for example to present results to users in a manner that
 > depends on the relevance of the individual results.  This seems easy to
 > do in a natural way along the lines of my last proposal (boost-weighted
 > normalization, possibly including some other factors).

I still agree that it would be great to have scores that could be compared
between different queries.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message