lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: strange idf in Lucene 2.1
Date Thu, 12 Apr 2007 16:36:28 GMT
On 4/12/07, Bill Janssen <janssen@parc.com> wrote:
> > docfreqs (idfs) do not take into account deleted docs.
> > This is more of an engineering tradeoff rather than a feature.
> > If we could cheaply and easily update idfs when documents are deleted
> > from an index, we would.
>
> Wow.  So is it fair to say that the stored IDF is really the
> cumulative IDF for all the documents that have ever been in the index
> since it was last optimized?

Not quite... all documents that are marked as deleted, but haven't
actually been removed from the index.  Adding new documents sometimes
causes segments to me merged, and the resulting new segment will have
no deleted docs.

The difference between IndexReader.maxDoc() and numDocs() tells you
how many documents have been marked for deletion but still take up
space in the index.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message