lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wettin <karl.wet...@gmail.com>
Subject Re: Document Similarities lucene(particularly using doc id's)
Date Mon, 20 Aug 2007 04:05:20 GMT

20 aug 2007 kl. 05.19 skrev Lokeya:
> Grant Ingersoll-6 wrote:
>> On Aug 16, 2007, at 2:20 PM, Lokeya wrote:
>>> I want to find out the document content similarity

>> A common way of doing this is by calculating the cosine of the angle
>> between the two vectors.

> I can use the getTermFreqVector() on Index Reader and get it. But I am
> wondering whats the API which has to be used to find the similarity  
> between
> 2 such vectors which would give a score (doc-doc similairty in   
> essence).

Bob Carpenter wrote an article on the subject for "Lucene in Action".  
He also
works on LingPipe, a semi-free peice of software that might be  
helpful if
your Greek kung fu is too weak.

<http://www.alias-i.com/lingpipe/docs/api/com/aliasi/spell/ 
TfIdfDistance.html>



-- 
karl





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message