lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: term vector
Date Thu, 24 Jun 2004 00:30:05 GMT
On Jun 23, 2004, at 7:57 PM, Stefan Groschupf wrote:
> Is there a best practice to get the term vector of an document?

You get term vectors by "field", not document.  As for best practice, 
there is really only one way to go about getting the term vectors:

         TermFreqVector termFreqVector =
             reader.getTermFreqVector(i, "subject");

where reader is an IndexReader and i is the document number.

> Is there any experience to do any kind of feature selection for 
> dimension reducing like zipf laws or getting tf/idf of a term for the 
> complete corpora.

Document frequency of a term is in the term info file (.tis) and 
available through TermEnum (see IndexReader.terms()).

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message