Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 797A710146 for ; Fri, 22 Nov 2013 17:39:42 +0000 (UTC) Received: (qmail 10259 invoked by uid 500); 22 Nov 2013 17:39:39 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 10217 invoked by uid 500); 22 Nov 2013 17:39:35 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 10209 invoked by uid 99); 22 Nov 2013 17:39:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Nov 2013 17:39:34 +0000 X-ASF-Spam-Status: No, hits=1.0 required=5.0 tests=FSL_HELO_BARE_IP_2,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ishalyminov@yandex-team.ru designates 95.108.130.40 as permitted sender) Received: from [95.108.130.40] (HELO forward-corp1f.mail.yandex.net) (95.108.130.40) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Nov 2013 17:39:27 +0000 Received: from webcorp1g.yandex-team.ru (webcorp1g.yandex-team.ru [95.108.252.5]) by forward-corp1f.mail.yandex.net (Yandex) with ESMTP id 42A6C24201D9 for ; Fri, 22 Nov 2013 21:39:06 +0400 (MSK) Received: from 127.0.0.1 (localhost [127.0.0.1]) by webcorp1g.yandex-team.ru (Yandex) with ESMTP id 1BE5C3A15AD; Fri, 22 Nov 2013 21:39:06 +0400 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1385141946; bh=I2eb5azyWyPdX170pEAalScM2B5J8jjnYGwhagR33Us=; h=From:To:In-Reply-To:References:Subject:Date; b=W/gArttDBjqFzTMWOXTDhvNvLZtddn0Onlr0PCBRaEWDZ9X5c/9BShdfw3VnJxHL7 h0HTEBML+wiKcm6hpML3z7jsXL0dMnrvCwUEbgcITeUw/24MfmVm0fHvN9OAtn/siH KoalAieMt3kMHDyPx4YwKnAtcg5ScWi40UjtazWY= Received: from v3-151-240.yandex.net (v3-151-240.yandex.net [84.201.151.240]) by webcorp1g.yandex-team.ru with HTTP; Fri, 22 Nov 2013 21:39:06 +0400 From: Igor Shalyminov To: "java-user@lucene.apache.org" In-Reply-To: <02df01cee6d2$9292add0$b7b80970$@thetaphi.de> References: <27461385048705@webcorp2h.yandex-team.ru> <02df01cee6d2$9292add0$b7b80970$@thetaphi.de> Subject: Re: Lucene multithreaded indexing problems MIME-Version: 1.0 Message-Id: <3681385141946@webcorp1g.yandex-team.ru> X-Mailer: Yamail [ http://yandex.ru ] 5.0 Date: Fri, 22 Nov 2013 21:39:06 +0400 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=koi8-r X-Virus-Checked: Checked by ClamAV on apache.org - uwe@ Thanks Uwe! I changed the logic so that my workers only parse input docs into Documents, and indexWriter does addDocuments() by itself for the chunks of 100 Documents. Unfortunately, this behaviour reproduces: memory usage slightly increases with the number of processed documents, and at some point the program runs very slowly, and it seems that only a single thread is active. It happens after lots of parse/index cycles. The current instance is now in the "single-thread" phase with ~100% CPU and with 8397M RES memory (limit for the VM is -Xmx8G). My question is, when does addDocuments() release all resourses passed in (the Documents themselves)? Are the resourses released after finishing the function call, or I have to do indexWriter.commit() after, say, each chunk? -- Igor 21.11.2013, 19:59, "Uwe Schindler" : > Hi, > > why are you doing this? Lucene's IndexWriter can handle addDocuments in multiple threads. And, since Lucene 4, it will process them almost completely parallel! > If you do the addDocuments single-threaded you are adding an additional bottleneck in your application. If you are doing a synchronization on IndexWriter (which I hope you will not do), things will go wrong, too. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: uwe@thetaphi.de > >> �-----Original Message----- >> �From: Igor Shalyminov [mailto:ishalyminov@yandex-team.ru] >> �Sent: Thursday, November 21, 2013 4:45 PM >> �To: java-user@lucene.apache.org >> �Subject: Lucene multithreaded indexing problems >> >> �Hello! >> >> �I tried to perform indexing multithreadedly, with a FixedThreadPool of >> �Callable workers. >> �The main operation - parsing a single document and addDocument() to the >> �index - is done by a single worker. >> �After parsing a document, a lot (really a lot) of Strings appears, and at the >> �end of the worker's call() all of them goes to the indexWriter. >> �I use no merging, the resourses are flushed on disk when the segment size >> �limit is reached. >> >> �The problem is, after a little while (when the most of the heap memory is >> �used) indexer makes no progress, and CPU load is constant 100% (no >> �difference if there are 2 threads or 32). So I think at some point garbage >> �collection takes the whole indexing process down. >> >> �Could you please give some advices on the proper concurrent indexing with >> �Lucene? >> �Can there be "memory leaks" somewhere in the indexWriter? Maybe I must >> �perform some operations with writer to release unused resourses from time >> �to time? >> >> �-- >> �Best Regards, >> �Igor >> >> �--------------------------------------------------------------------- >> �To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> �For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org