lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernd Fehling <bernd.fehl...@uni-bielefeld.de>
Subject BufferedUpdateStreams breaks high performance indexing
Date Thu, 28 Jul 2016 09:01:09 GMT
While trying to get higher performance for indexing it turned out that
BufferedUpdateStreams is breaking indexing performance.
public synchronized ApplyDeletesResult applyDeletesAndUpdates(...)

At IndexWriterConfig I have setRAMBufferSizeMB=1024 and the Lucene 4.10.4 API states:
"Determines the amount of RAM that may be used for buffering added
documents and deletions before they are flushed to the Directory.
Generally for faster indexing performance it's best to flush by RAM
usage instead of document count and use as large a RAM buffer as you can."

Also setMaxBufferedDocs=-1 and setMaxBufferedDeleteTerms=-1.

BD 0 [Wed Jul 27 13:42:03 GMT+01:00 2016; Thread-27890]: applyDeletes: infos=...
BD 0 [Wed Jul 27 14:38:55 GMT+01:00 2016; Thread-27890]: applyDeletes took 3411845 msec

About 56 minutes no indexing and only applying deletes.
What is it deleting?

If the index gets bigger the time gets longer, currently 2.5 hours of waiting.
I'm adding 96 million docs with uniq id, no duplicates, only add, no deletes.

Any suggestions which config is _really_ going for high performance indexing?

Best regards,
Bernd

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message