lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: BufferedUpdateStreams breaks high performance indexing
Date Thu, 28 Jul 2016 13:35:10 GMT
Hmm not good.

If you are really only adding documents, you should be using
IndexWriter.addDocument, which won't buffer any deleted terms and that
method call should be a no-op.  It also makes flushes more efficient since
all of your indexing buffer goes to the added documents, not buffered
delete terms.  Are you using updateDocument?

Can you reproduce this slowness on a newer release?  There have been
performance issues fixed in newer releases in this method, e.g
https://issues.apache.org/jira/browse/LUCENE-6161

Have you changed any IndexWriterConfig settings from defaults?

What are your unique id fields like?  How many bytes in length?

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jul 28, 2016 at 5:01 AM, Bernd Fehling <
bernd.fehling@uni-bielefeld.de> wrote:

> While trying to get higher performance for indexing it turned out that
> BufferedUpdateStreams is breaking indexing performance.
> public synchronized ApplyDeletesResult applyDeletesAndUpdates(...)
>
> At IndexWriterConfig I have setRAMBufferSizeMB=1024 and the Lucene 4.10.4
> API states:
> "Determines the amount of RAM that may be used for buffering added
> documents and deletions before they are flushed to the Directory.
> Generally for faster indexing performance it's best to flush by RAM
> usage instead of document count and use as large a RAM buffer as you can."
>
> Also setMaxBufferedDocs=-1 and setMaxBufferedDeleteTerms=-1.
>
> BD 0 [Wed Jul 27 13:42:03 GMT+01:00 2016; Thread-27890]: applyDeletes:
> infos=...
> BD 0 [Wed Jul 27 14:38:55 GMT+01:00 2016; Thread-27890]: applyDeletes took
> 3411845 msec
>
> About 56 minutes no indexing and only applying deletes.
> What is it deleting?
>
> If the index gets bigger the time gets longer, currently 2.5 hours of
> waiting.
> I'm adding 96 million docs with uniq id, no duplicates, only add, no
> deletes.
>
> Any suggestions which config is _really_ going for high performance
> indexing?
>
> Best regards,
> Bernd
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message