lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amir Hossein Jadidinejad <amir.jad...@yahoo.com>
Subject Re: Doc-Doc Similarity Matrix Construction
Date Mon, 29 Jun 2009 20:26:16 GMT
It's exactly my question: http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg04915.html

--- On Mon, 6/29/09, Amir Hossein Jadidinejad <amir.jadidi@yahoo.com> wrote:

From: Amir Hossein Jadidinejad <amir.jadidi@yahoo.com>
Subject: Doc-Doc Similarity Matrix Construction
To: java-user@lucene.apache.org
Date: Monday, June 29, 2009, 3:14 PM

Hi,
It's my first experiment with Lucene. Please help me.
I'm
going to index a set of documents and create a feature vector for each
of them. This vector contains all terms belong to the document that
weight using TFIDF.
After that I want to compute the cosine similarity between all documents and produce a doc-doc
similarity matrix. My document set is large and it's important to have a scalable implementation.
Would you please provide me a guideline or to-do list?
Thank you and kind regards.


      


      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message