lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Goller <gol...@detego-software.de>
Subject Re: RE : Term Collection Frequency?
Date Wed, 04 Aug 2004 18:58:57 GMT
ABDOU Samir wrote:
> Thanks,
> 
> 
>>>What about the frequency of any given term in the whole collection!?
> 
> 
>>IndexReader.docFreq(Term t)
> 
> 
> this method doesn't give us the collection frequency of the given term
> t, but the number of documents in which this term appears. 
> 
> Here an example of what I want:
> 
> -------------------------------
> We have this table for a term T
> 
> Doc ID : 0, 1, 2, 3, 4
> Frequency : 3, 5, 4, 2, 5  
> 
> In which this term appears 3 times in the document 0, 5 times in the
> document 1... and so on !
> 
> So the collection frequency of this term would be 3+5+4+2+5 = 19
> 
> N.B. : calculate this for each term at runtime will be very expensive!
> Is it possible to calculate and store this information during indexing? 

Currently the only way to get this information is the expensive one.
You have to go through TermDocs and sum up all the frequencies (as Julien
already said).

Christoph


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message