lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: TermEnum with deleted dccuments
Date Thu, 07 May 2009 09:37:37 GMT
This is known & expected.

Lucene does not update the terms dictionary (meaning which terms are
in the index, and their frequency) in response to deleted docs.

It does update TermDocs enumeration, ie once you get the TermDocs for
a given term and step through its docs, the deleted docs will not be
returned.

One workaround is to call IndexWriter.expungeDeletes, but that's a
costly operation (forces merges of any segments containing deletes).

https://issues.apache.org/jira/browse/LUCENE-1613 was opened to gather
use cases / issues on this... if this is impacting your application,
can you post some details to that issue?

Mike

On Thu, May 7, 2009 at 1:04 AM, Antony Bowesman <adb@teamware.com> wrote:
> I am merging Index A to Index B.  First I read the terms for a particular
> field from index A and some of the documents in A get deleted.
>
> I then enumerate the terms on a different field also in index A, but the
> terms from the deleted document are still present.
>
> The termEnum.docFreq() also returns > 0 for those terms even though the docs
> are deleted.
>
> Should this be the case?  I have tried closing the reader between
> enumerations, but no difference.
>
> Antony
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message