lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Goller <gol...@detego-software.de>
Subject Re: Term Collection Frequency?
Date Thu, 05 Aug 2004 09:32:15 GMT
Niranjan Balasubramanian wrote:
> Calculating the total occurrence counts of a term in all of the documents in the collection
via the TermDocs route is costly if you do it at runtime for a probabilstic retrieval model.
However, this process could be taken offline and you can create a new index which has a Document
for each term in the original index and a stored field with the occurrence count calculated
from the offline process.  This could save you a lot of runtime compuatations and also can
provide you with capability to store collection level  statistics about a term.
> 
> - Niranjan

For the problem of updating this second index when new documents come in
or documents are deleted (in the original document index), TermVectors
could be usefull. They contain the information that is needed to update
the second index.

Christoph


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message