lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Vector space implemantion
Date Thu, 09 Apr 2009 13:59:41 GMT
Assuming you want to handle the vectors yourself, as opposed to  
relying on the fact that Lucene itself implements the VSM, you should  
index your documents with TermVector.YES.  That will give you the term  
freq on a per doc basis, but you will have to use the TermEnum to get  
the Doc Freq.  All and all, this is not going to be very efficient for  
you, but you should be able to build up a matrix from it.

What is the problem you are trying to solve?

On Apr 9, 2009, at 2:33 AM, Andy wrote:

> Hello all,
> I'm trying to implement a vector space model using lucene. I need to  
> have a file (or on memory) with TF/IDF weight of each term in each  
> document. (in fact that is a matrix with documents presented as  
> vectors, in which the elements of each vector is the TF weight ...)
> Please Please help me on this
> contact me if you need any further info via
> Many Many thanks

Grant Ingersoll

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message