lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Shalyminov <ishalymi...@yandex-team.ru>
Subject Re: Lucene multithreaded indexing problems
Date Sat, 23 Nov 2013 15:45:30 GMT
So we return to the initially described setup: multiple parallel workers, each making "parse
+ indexWriter.addDocument()" for single documents with no synchronization at my side. This
setup was also bad on memory consumption and thread blocking, as I reported.

Or did I misunderstand you?

-- 
Igor

22.11.2013, 23:34, "Uwe Schindler" <uwe@thetaphi.de>:
> Hi,
> Don't use addDocuments. This method is more made for so called block indexing (where
all documents need to be on a block for block joins). Call addDocument for each document possibly
from many threads.  By this Lucene can better handle multithreading and free memory early.
There is really no need to use bulk adds, this is solely for block joins, where docs need
to be sequential and without gaps.
>
> Uwe
>
> Igor Shalyminov <ishalyminov@yandex-team.ru> schrieb:
>
>> - uwe@
>>
>> Thanks Uwe!
>>
>> I changed the logic so that my workers only parse input docs into
>> Documents, and indexWriter does addDocuments() by itself for the chunks
>> of 100 Documents.
>> Unfortunately, this behaviour reproduces: memory usage slightly
>> increases with the number of processed documents, and at some point the
>> program runs very slowly, and it seems that only a single thread is
>> active.
>> It happens after lots of parse/index cycles.
>>
>> The current instance is now in the "single-thread" phase with ~100% CPU
>> and with 8397M RES memory (limit for the VM is -Xmx8G).
>> My question is, when does addDocuments() release all resourses passed
>> in (the Documents themselves)?
>> Are the resourses released after finishing the function call, or I have
>> to do indexWriter.commit() after, say, each chunk?
>>
>> --
>> Igor
>>
>> 21.11.2013, 19:59, "Uwe Schindler" <uwe@thetaphi.de>:
>>>  Hi,
>>>
>>>  why are you doing this? Lucene's IndexWriter can handle addDocuments
>> in multiple threads. And, since Lucene 4, it will process them almost
>> completely parallel!
>>>  If you do the addDocuments single-threaded you are adding an
>> additional bottleneck in your application. If you are doing a
>> synchronization on IndexWriter (which I hope you will not do), things
>> will go wrong, too.
>>>  Uwe
>>>
>>>  -----
>>>  Uwe Schindler
>>>  H.-H.-Meier-Allee 63, D-28213 Bremen
>>>  http://www.thetaphi.de
>>>  eMail: uwe@thetaphi.de
>>>>   -----Original Message-----
>>>>   From: Igor Shalyminov [mailto:ishalyminov@yandex-team.ru]
>>>>   Sent: Thursday, November 21, 2013 4:45 PM
>>>>   To: java-user@lucene.apache.org
>>>>   Subject: Lucene multithreaded indexing problems
>>>>
>>>>   Hello!
>>>>
>>>>   I tried to perform indexing multithreadedly, with a FixedThreadPool
>> of
>>>>   Callable workers.
>>>>   The main operation - parsing a single document and addDocument() to
>> the
>>>>   index - is done by a single worker.
>>>>   After parsing a document, a lot (really a lot) of Strings appears,
>> and at the
>>>>   end of the worker's call() all of them goes to the indexWriter.
>>>>   I use no merging, the resourses are flushed on disk when the
>> segment size
>>>>   limit is reached.
>>>>
>>>>   The problem is, after a little while (when the most of the heap
>> memory is
>>>>   used) indexer makes no progress, and CPU load is constant 100% (no
>>>>   difference if there are 2 threads or 32). So I think at some point
>> garbage
>>>>   collection takes the whole indexing process down.
>>>>
>>>>   Could you please give some advices on the proper concurrent
>> indexing with
>>>>   Lucene?
>>>>   Can there be "memory leaks" somewhere in the indexWriter? Maybe I
>> must
>>>>   perform some operations with writer to release unused resourses
>> from time
>>>>   to time?
>>>>
>>>>   --
>>>>   Best Regards,
>>>>   Igor
>>  ---------------------------------------------------------------------
>>>>   To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>   For additional commands, e-mail: java-user-help@lucene.apache.org
>>>  ---------------------------------------------------------------------
>>>  To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>  For additional commands, e-mail: java-user-help@lucene.apache.org
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> --
> Uwe Schindler
> H.-H.-Meier-Allee 63, 28213 Bremen
> http://www.thetaphi.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message