lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy <andykan1...@yahoo.com>
Subject Re: Vector space implemantion
Date Thu, 09 Apr 2009 17:01:27 GMT

Well, I'm planning to have the term weights (assume in a matrix) and then using an adaptive
learning system transform them into a new weights in such a way that index formed of these
be optimized. Its just a test to see if this hypothesis is working or not.


--- On Thu, 4/9/09, Grant Ingersoll <gsingers@apache.org> wrote:

From: Grant Ingersoll <gsingers@apache.org>
Subject: Re: Vector space implemantion
To: java-user@lucene.apache.org
Date: Thursday, April 9, 2009, 6:29 PM

Assuming you want to handle the vectors yourself, as opposed to relying on the fact that Lucene
itself implements the VSM, you should index your documents with TermVector.YES.  That will
give you the term freq on a per doc basis, but you will have to use the TermEnum to get the
Doc Freq.  All and all, this is not going to be very efficient for you, but you should be
able to build up a matrix from it.

What is the problem you are trying to solve?



On Apr 9, 2009, at 2:33 AM, Andy wrote:

> Hello all,
> 
> I'm trying to implement a vector space model using lucene. I need to have a file (or
on memory) with TF/IDF weight of each term in each document. (in fact that is a matrix with
documents presented as vectors, in which the elements of each vector is the TF weight ...)
> 
> Please Please help me on this
> contact me if you need any further info via andykan1984@yahoo.com
> Many Many thanks
> 
> 
> 
> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message