lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Harwood <>
Subject Re: Doc-Doc Similarity Matrix Construction
Date Mon, 29 Jun 2009 19:22:43 GMT
See MoreLikeThis in the contrib/queries folder. It optimizes the speed  
of similarity comparisons  by taking the most significant words only  
from a document as search terms.

On 29 Jun 2009, at 20:14, Amir Hossein Jadidinejad wrote:

> Hi,
> It's my first experiment with Lucene. Please help me.
> I'm
> going to index a set of documents and create a feature vector for each
> of them. This vector contains all terms belong to the document that
> weight using TFIDF.
> After that I want to compute the cosine similarity between all  
> documents and produce a doc-doc similarity matrix. My document set  
> is large and it's important to have a scalable implementation.
> Would you please provide me a guideline or to-do list?
> Thank you and kind regards.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message