lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless" <luc...@mikemccandless.com>
Subject Re: IndexWriter.deleteDocuments(Term) vs IndexReader.deleteDocuments(Term)
Date Thu, 15 Mar 2007 22:57:16 GMT
"Antony Bowesman" <adb@teamware.com> wrote:
> The writer method does not return the number of deleted documents.  Is
> there a 
> technical reason why this is not done.
> 
> I am planning to see about converting my batch deletions using
> IndexReader to 
> IndexWriter, but I'm currently using the return value to record stats.
> 
> Does the following give the same results?
> 
>    int beforeCount = writer.docCount();
>    writer.deleteDocuments(term);
>    int deleted = beforeCount - writer.docCount();
> 
> Given that I add and delete in batches, is there any benefit to switching
> to 
> IndexWriter for deletions?

Good point, this is a difference in the two APIs.

IndexWriter doesn't actually delete your documents in
deleteDocuments.  Instead, it buffers up all terms for deletion and
then only on flushing (either because too many buffered deleted terms
or too many buffered added docs) will it apply the deletes to the
on-disk segments.

So your good idea to use IndexWriter.docCount() won't actually work
(in fact that method never includes deletions until the segments with
deletions are merged/optimized).

If getting the actual number of deleted docs is important I think you
should keep using IndexReader to do deletions?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message