| Message view | « Date » · « Thread » |
|---|---|
| Top | « Date » · « Thread » |
| From | Amir Hossein Jadidinejad <amir.jad...@yahoo.com> |
| Subject | A simple Vector Space Model and TFIDF usage |
| Date | Mon, 29 Jun 2009 19:10:02 GMT |
Hi,
It's my first experiment with Lucene. Please help me.
I'm going to index a set of documents and create a feature vector for each of them. This vector
contains all terms belong to the document that weight using TFIDF.
After that I want to compute the cosine similarity between all documents and produce a doc-doc
similarity matrix. My document set is large and it's important to have a scalable implementation.
Would you please provide me a guideline or to-do list?
Thank you and kind regards.
| |
| Mime |
|
| View raw message | |