lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wettin <>
Subject Re: Document Similarities lucene(particularly using doc id's)
Date Mon, 20 Aug 2007 04:05:20 GMT

20 aug 2007 kl. 05.19 skrev Lokeya:
> Grant Ingersoll-6 wrote:
>> On Aug 16, 2007, at 2:20 PM, Lokeya wrote:
>>> I want to find out the document content similarity

>> A common way of doing this is by calculating the cosine of the angle
>> between the two vectors.

> I can use the getTermFreqVector() on Index Reader and get it. But I am
> wondering whats the API which has to be used to find the similarity  
> between
> 2 such vectors which would give a score (doc-doc similairty in   
> essence).

Bob Carpenter wrote an article on the subject for "Lucene in Action".  
He also
works on LingPipe, a semi-free peice of software that might be  
helpful if
your Greek kung fu is too weak.



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message