lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Shalyminov <>
Subject Re: Lucene multithreaded indexing problems
Date Fri, 22 Nov 2013 17:39:06 GMT
- uwe@

Thanks Uwe!

I changed the logic so that my workers only parse input docs into Documents, and indexWriter
does addDocuments() by itself for the chunks of 100 Documents.
Unfortunately, this behaviour reproduces: memory usage slightly increases with the number
of processed documents, and at some point the program runs very slowly, and it seems that
only a single thread is active.
It happens after lots of parse/index cycles.

The current instance is now in the "single-thread" phase with ~100% CPU and with 8397M RES
memory (limit for the VM is -Xmx8G).
My question is, when does addDocuments() release all resourses passed in (the Documents themselves)?
Are the resourses released after finishing the function call, or I have to do indexWriter.commit()
after, say, each chunk? 


21.11.2013, 19:59, "Uwe Schindler" <>:
> Hi,
> why are you doing this? Lucene's IndexWriter can handle addDocuments in multiple threads.
And, since Lucene 4, it will process them almost completely parallel!
> If you do the addDocuments single-threaded you are adding an additional bottleneck in
your application. If you are doing a synchronization on IndexWriter (which I hope you will
not do), things will go wrong, too.
> Uwe
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> eMail:
>>  -----Original Message-----
>>  From: Igor Shalyminov []
>>  Sent: Thursday, November 21, 2013 4:45 PM
>>  To:
>>  Subject: Lucene multithreaded indexing problems
>>  Hello!
>>  I tried to perform indexing multithreadedly, with a FixedThreadPool of
>>  Callable workers.
>>  The main operation - parsing a single document and addDocument() to the
>>  index - is done by a single worker.
>>  After parsing a document, a lot (really a lot) of Strings appears, and at the
>>  end of the worker's call() all of them goes to the indexWriter.
>>  I use no merging, the resourses are flushed on disk when the segment size
>>  limit is reached.
>>  The problem is, after a little while (when the most of the heap memory is
>>  used) indexer makes no progress, and CPU load is constant 100% (no
>>  difference if there are 2 threads or 32). So I think at some point garbage
>>  collection takes the whole indexing process down.
>>  Could you please give some advices on the proper concurrent indexing with
>>  Lucene?
>>  Can there be "memory leaks" somewhere in the indexWriter? Maybe I must
>>  perform some operations with writer to release unused resourses from time
>>  to time?
>>  --
>>  Best Regards,
>>  Igor
>>  ---------------------------------------------------------------------
>>  To unsubscribe, e-mail:
>>  For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message