lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: A question about scoring function in Lucene
Date Wed, 15 Dec 2004 19:12:05 GMT
Chuck Williams wrote:
> I believe the biggest problem with Lucene's approach relative to the pure vector space
model is that Lucene does not properly normalize.  The pure vector space model implements
a cosine in the strictly positive sector of the coordinate space.  This is guaranteed intrinsically
to be between 0 and 1, and produces scores that can be compared across distinct queries (i.e.,
"0.8" means something about the result quality independent of the query).

I question whether such scores are more meaningful.  Yes, such scores 
would be guaranteed to be between zero and one, but would 0.8 really be 
meaningful?  I don't think so.  Do you have pointers to research which 
demonstrates this?  E.g., when such a scoring method is used, that 
thresholding by score is useful across queries?

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message