lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clemens Wyss DEV <clemens...@mysign.ch>
Subject "batch-update"-pattern, NoMergeScheduler?
Date Mon, 22 Dec 2014 09:45:42 GMT
One of our indexes is updated completely quite frequently -> "batch update" or "re-index".

If so more than 2million documents are added/updated to/in the very index. This creates an
immense IO load on our system. Does it make sense to set merge scheduler to NoMergeScheduler
(and/or MergePolicy to NoMergePolicy). Or is merging "not relevant" as the commit is done
at the very end only?

Context information:
At the moment the writer's config consists only of setRAMBufferSizeMB:
IndexWriterConfig config = new IndexWriterConfig( IndexManager.CURRENT_LUCENE_VERSION, analyzer
);
config.setMergePolicy( NoMergePolicy.NO_COMPOUND_FILES );
//config.setMergeScheduler( NoMergeScheduler.INSTANCE );
config.setRAMBufferSizeMB( 20 );

The update logic is as follows:
indexWriter.deleteAll()
...
for all elements do {
...
indexWriter.updateDocument( term, doc ); // in order to omit "duplicate entries"
...
}
indexWriter.commit

What is the proposed way to perform such a batch update?
Mime
View raw message