lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Danilo Cicognani" <d.cicogn...@tinfo.it>
Subject Re: Max Frequency and Tf/Idf
Date Tue, 18 Apr 2006 09:45:38 GMT
Hi Grant Ingersoll and everybody.

> The Term Vector code can be used to get the term frequencies from a
> specific document.  Search this list, see the Lucene In
> Action book or
> look at http://www.cnlp.org/apachecon2005 for examples on how to use
> Term Vectors 

Maybe I didn't explain well my question.
Following is the code we are using now: we was considering the possiblity to
have more informations from Lucene (for example the maximum term frequency
in one document) to optimized the calculations.
The first method is the one that start the calculation of Tf/Idf using the
class TTfIdf whose constructor is reported below.

public TTfIdf getFieldTfIdf(long tid, long aid, String field) throws 
RisorseMultipleException, IOException, RisorsaNonTrovataException, 
TTfIdfException {
		reader= IndexReader.open(indexDir);
		int id=getDocumentId(tid,aid);
		TermFreqVector tfv = reader.getTermFreqVector(id,field);
		int[] freqs=tfv.getTermFrequencies();
		String[] terms=tfv.getTerms();
		int[] df=new int[terms.length];
		for(int i=0;i<df.length;i++)
			df[i]=reader.docFreq(new Term(field,terms[i]));
		TTfIdf tfidf = new TTfIdf(terms,freqs,df,reader.numDocs());
		reader.close();
		return tfidf;
	}

public TTfIdf(String[] terms,int[] freqs, int[] df,int docs) throws 
TTfIdfException{
		if(terms.length!=freqs.length||terms.length!=df.length)
throw new 
TTfIdfException("I vettori dei termini e delle frequenze sono di diversa 
lunghezza!");
		this.terms=terms;
		int l=freqs.length;
		int maxfreq=0;
		for(int i=0;i<l;i++){	// CAN BE OPTIMIZED IN SOME WAY?
			if(freqs[i]>maxfreq) maxfreq=freqs[i];
		}
		this.freqs=new double[l];
		double tf;
		double idf;
		for(int i=0;i<l;i++){	// CAN BE OPTIMIZED IN SOME WAY?
			tf=(double)freqs[i]/(double)maxfreq;
			idf=Math.log((double)docs/(double)df[i]);
			this.freqs[i]=tf*idf;
		}
	}

Have you got some suggestions?

**** 1000 KBye ****

 [) /\ |\| | |_ ()

web: www.ciconet.it
Web Portal Now: www.webportalnow.com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message