lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From karl wettin <ka...@snigel.net>
Subject Re: Max Frequency and Tf/Idf
Date Tue, 18 Apr 2006 10:00:14 GMT

18 apr 2006 kl. 11.45 skrev Danilo Cicognani:

> Following is the code we are using now: we was considering the  
> possiblity to
> have more informations from Lucene (for example the maximum term  
> frequency
> in one document) to optimized the calculations.
> The first method is the one that start the calculation of Tf/Idf  
> using the
> class TTfIdf whose constructor is reported below.
>
> 		for(int i=0;i<l;i++){	// CAN BE OPTIMIZED IN SOME WAY?
> 			if(freqs[i]>maxfreq) maxfreq=freqs[i];
> 		}
> 		this.freqs=new double[l];
> 		double tf;
> 		double idf;
> 		for(int i=0;i<l;i++){	// CAN BE OPTIMIZED IN SOME WAY?
> 			tf=(double)freqs[i]/(double)maxfreq;
> 			idf=Math.log((double)docs/(double)df[i]);
> 			this.freqs[i]=tf*idf;
> 		}

Not quite sure what you do above, but I guess you could caclulate the  
information at index time. To persist it in the index, extend/hack  
TermFreqVector and related IO-classes.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message