lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiangyu Jin <x...@cs.virginia.edu>
Subject Lucene's ranking function VS Standard VSM model
Date Tue, 30 Nov 2004 16:17:38 GMT

 I have seen different versions of Lucene's ranking function
from the similarity document and Lucene user list.

Since I need to get document-doucment similaries,
so what I do is to issue the document as query directly.
I found it is different if we issue "computer computer"
to Lucene vers we issue it to standard VSM. The latter one
will treat "computer computer" as "computer" but Lucene
doesn't.

In order to illustrate my question more clear, I write
a more formalized document

http://www.cs.virginia.edu/~xj3a/lucene_ranking.pdf

so that there is no ambiguity of those formulas.

I am not asure whether I understand correctly, but the
major reason comes from Lucene's query parser. It defaults
each term appear once. If we issue a query term multiple
times in the query string, it will result in some un-expected
results.

For detail information, pls refer to the attached link.

thanks

xiangyu jin

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message