lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll" <gsing...@syr.edu>
Subject Re: similarity of two texts
Date Tue, 01 Jun 2004 14:13:09 GMT
Sorry, about the mispelling, Erik!

Thanks for the insight.

Explain is my friend as an end user, but it, too, is confusing at the code level!  At some
point I will have time to dig deeper and step through the scoring code.

>>> erik@ehatchersolutions.com 06/01/04 09:39AM >>>
On Jun 1, 2004, at 9:24 AM, Grant Ingersoll wrote:
> Hey Eric,

Eri*K*  :)

> What did you do to calc similarity?

I computed the angle between two vectors.  The vectors are obtained 
from IndexReader.getTermFreqVector(docId, "field").

>   I haven't had time, but was thinking of ways to add the ability to 
> get the similarity score (as calculated when doing a search) given a 
> term vector (or just a document id).

It would be quite compute-intensive to do something like this.  This 
could be done through a custom sort as well, if applying it at the 
scoring level doesn't work.  I haven't given any thought to how this 
could work for scoring or sorting before, but does sound quite 
interesting.

>   Any ideas on how to approach this would be appreciated.  The scoring 
> in Lucene has always been a bit confusing to me, despite looking at 
> the code several times, especially once you get into boolean queries, 
> etc.

No doubt that it is confusing - to me also.  But Explanation is your 
friend.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org 
For additional commands, e-mail: lucene-user-help@jakarta.apache.org 



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message