lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Bowesman <...@teamware.com>
Subject Re: TermEnum with deleted dccuments
Date Mon, 11 May 2009 02:32:49 GMT
Hi Mike,

Thanks for the response.

I looked at that issue, but my case is trivial to fix.  I just keep the Set of 
terms I have deleted and ignore those during my second interation.

Thanks
Antony



Michael McCandless wrote:
> This is known & expected.
> 
> Lucene does not update the terms dictionary (meaning which terms are
> in the index, and their frequency) in response to deleted docs.
> 
> It does update TermDocs enumeration, ie once you get the TermDocs for
> a given term and step through its docs, the deleted docs will not be
> returned.
> 
> One workaround is to call IndexWriter.expungeDeletes, but that's a
> costly operation (forces merges of any segments containing deletes).
> 
> https://issues.apache.org/jira/browse/LUCENE-1613 was opened to gather
> use cases / issues on this... if this is impacting your application,
> can you post some details to that issue?
> 
> Mike
> 
> On Thu, May 7, 2009 at 1:04 AM, Antony Bowesman <adb@teamware.com> wrote:
>> I am merging Index A to Index B.  First I read the terms for a particular
>> field from index A and some of the documents in A get deleted.
>>
>> I then enumerate the terms on a different field also in index A, but the
>> terms from the deleted document are still present.
>>
>> The termEnum.docFreq() also returns > 0 for those terms even though the docs
>> are deleted.
>>
>> Should this be the case?  I have tried closing the reader between
>> enumerations, but no difference.
>>
>> Antony



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message