lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Niranjan Balasubramanian" <NBala...@syr.edu>
Subject Re: Term Collection Frequency?
Date Wed, 04 Aug 2004 20:21:40 GMT
Calculating the total occurrence counts of a term in all of the documents in the collection
via the TermDocs route is costly if you do it at runtime for a probabilstic retrieval model.
However, this process could be taken offline and you can create a new index which has a Document
for each term in the original index and a stored field with the occurrence count calculated
from the offline process.  This could save you a lot of runtime compuatations and also can
provide you with capability to store collection level  statistics about a term.

- Niranjan

Niranjan Balasubramanian
Software Engineer
Center For Natural Language Processing
(http://cnlp.syr.edu)
Syracuse University

>>> erik@ehatchersolutions.com 8/4/2004 11:34:40 AM >>>
On Aug 4, 2004, at 8:25 AM, ABDOU Samir wrote:
> What about the frequency of any given term in the whole collection!?

IndexReader.docFreq(Term t)

> Calculate this at runtime may affect considerably performance!

It's computed during indexing!  :)

	Erik


>
> Thanks,
>
>
> -----Message d'origine-----
> De : Erik Hatcher [mailto:erik@ehatchersolutions.com] 
> Envoyé : mercredi, 4. août 2004 12:25
> À : Lucene Developers List
> Objet : Re: Term Collection Frequency?
>
> The new term vector feature will give you this exact information for a
> particular document or field.
>
> 	Erik
>
>
> On Aug 4, 2004, at 3:59 AM, ABDOU Samir wrote:
>
>> Hi,
>>
>> In order to implement a new search model within Lucene 
>> (probabilistic),
>> I need a collection frequency of each term (the number of occurrences
>> of
>> a term within a collection). So, what would be the best way to
>> implement
>> this?
>>
>> Any suggestions, ideas... are welcome.
>>
>> Thanks,
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org 
>> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org 
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org 
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org 
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org 
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org 
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org 



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message