lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: delete by docid in lucene 4
Date Thu, 12 Jul 2012 22:25:25 GMT
On Thu, Jul 12, 2012 at 6:17 PM, Simon Willnauer
<simon.willnauer@gmail.com> wrote:
> Sean seriously a couple of hundred docs a second, don't bother just
> use updateDocument. My benchmarks show that there is only a smallish
> impact during indexing especially with concurrent flushing in lucene
> 4. I don't know how resource intensive your analysis chain is but on a
> decent machine you can easily go > 20k docs a second with
> updateDocument.
>
> If you want to give deleteByDocid a try for kicks I'd be curious how
> you solve some of the really tricky issues! :)

This (add delete-by-docID to IndexWriter) has been fairly frequently
requested...

But the problem is docIDs can suddenly change up whenever a merge
commits, so I don't see how we can add it in general.

That said, there is an initial patch here:

    https://issues.apache.org/jira/browse/LUCENE-4203

It adds IW.tryDeleteDocument(AtomicReader reader, int docID), with the
requirement that the reader is a near-real-time reader obtained from
the writer.  The delete will succeed (return true) if that reader has
not yet been merged away, else it fails (returns false) and you have
to do the delete the "normal" way (by Term).

I won't have much time to get back to that issue in the near future so
feel free to take it!

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message