lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Venkateshprasanna <>
Subject Re: Extracting data from Lucene index files
Date Wed, 20 Dec 2006 03:20:52 GMT

> Take a look at TermDocs and TermEnum.

I need to get the frequency of each word in each of the documents I have

This is what I could do with TermEnums and TermDocs. For each Term from
TermEnum, I have instantiated a TermsDoc and for each doc, I am trying to
get the frequency of the Term.

    IndexReader ir ="index file");
    TermEnum terms = ir.terms();
    while( {
        TermDocs docs = ir.termDocs(terms.term());
        while( {
         	TermFreqVector tfv = ir.getTermFreqVector(docs.doc(),"contents");
         	String indexTerms[] = tfv.getTerms();
         	int indexFreqs[] = tfv.getTermFrequencies();

         	for(int i = 0; i<indexTerms.length; i++) {
         		System.out.println(indexTerms[i]+" "+indexFreqs[i]);

But there is no way of getting the frequency of only 'that' term in 'that'
document. I have to get the entire vector. This puts the loop in jeopardy.
How can I overcome this?

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message