Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D91387E97 for ; Wed, 3 Aug 2011 23:32:03 +0000 (UTC) Received: (qmail 54025 invoked by uid 500); 3 Aug 2011 23:32:01 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 53961 invoked by uid 500); 3 Aug 2011 23:32:01 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 53952 invoked by uid 99); 3 Aug 2011 23:32:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Aug 2011 23:32:01 +0000 X-ASF-Spam-Status: No, hits=2.6 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of devon.odell@gmail.com designates 209.85.160.176 as permitted sender) Received: from [209.85.160.176] (HELO mail-gy0-f176.google.com) (209.85.160.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Aug 2011 23:31:54 +0000 Received: by gyg13 with SMTP id 13so50072gyg.35 for ; Wed, 03 Aug 2011 16:31:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=NFaiZl7KDmIfgkSy3bAtjbls0sqzMxlYfhgJeXWn6Hk=; b=e5RUwVV8VCusCn7dQUIM7u5yaqMtUsp7uljOM760jDxPdGW5bF6DsawIAhSgA0LjF0 f2d5uFB+THfjmAtBnMire0ygxllCn6NVSZZSflKZi0N1bGPsCoPQAEfQArDmv8Hdjw8t eHmDd7ymjuNx39tuPWI+YlSneZg1qGaWxLppU= MIME-Version: 1.0 Received: by 10.42.152.10 with SMTP id g10mr92684icw.138.1312414292993; Wed, 03 Aug 2011 16:31:32 -0700 (PDT) Received: by 10.231.11.71 with HTTP; Wed, 3 Aug 2011 16:31:32 -0700 (PDT) In-Reply-To: <1312413871179-3223874.post@n3.nabble.com> References: <1312380998109-3222427.post@n3.nabble.com> <1312413871179-3223874.post@n3.nabble.com> Date: Wed, 3 Aug 2011 19:31:32 -0400 Message-ID: Subject: Re: Thread locking while merging (ConcurrentMergeScheduler issue?) From: "Devon H. O'Dell" To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org For what it's worth, I've seen this happen too (using the stock Lucene 3.3 Java APIs), but it requires me to index many millions of documents, and doesn't start being a really big problem until the indexes get to be closer to 250GB in size. When they reach around 1TB, it will take around an hour for the merge to complete (which is frustrating). Similar to Pierre-Henri, I see virtually no disk I/O when it happens and the system in question is one of the Amazon EC2 "Huge" instances (so, something like 8 cores and 32GB RAM) and disk I/O during indexing pushes around 100MB/s. If it would be useful to see additional reports / information from this scenario, I'm sure I can get something put together. --dho 2011/8/3 Pierre-Henri Toussaint : > OK so the problem definitely comes from the slow merging. > I slightly increased the number merge count and thread to avoid the probl= em > described previously. But as expected, it just delayed it ! > > results : 75 minutes to index the 33GB xml file, and 150 minutes to finis= h > the merge after indexer.close. > See uploaded =A0http://lucene.472066.n3.nabble.com/file/n3223874/slowmerg= e log > file =A0containing: logs (timems:numberofdocsindexed/current_title) + > infoStream + random threaddump. > You can spot "indexer.close (no optimize)" (line 5721) for indexing > completion and the beginning of merging nightmare. > > *conf : > */conf.setRAMBufferSizeMB(512); > ConcurrentMergeScheduler mergeScheduler =3D new ConcurrentMergeScheduler(= ); > mergeScheduler.setMaxMergeCount(6); > mergeScheduler.setMaxThreadCount(4); > conf.setMergeScheduler(mergeScheduler); > writer =3D new ThreadedIndexWriter(directory, analyzer, true, 2, 5, conf)= ;/ >>>everything else default. no optimize called > *documents : > */pageDocument.add(new Field("title", page.getTitle(), Field.Store.YES, > Field.Index.NO)); > pageDocument.add(new Field("text", page.getText(), Field.Store.NO, > Field.Index.ANALYZED)); > if (page.getContributorUserName() !=3D null) > pageDocument.add(new Field("contributorUserName", > page.getContributorUserName(), Field.Store.NO, Field.Index.ANALYZED));/ > *infoStream info :* > setInfoStream > deletionPolicy=3Dorg.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy= @2dafae45 > dir=3Dorg.apache.lucene.store.NIOFSDirectory@/Users/ptoussaint/Documents/= workspace/wikisearch/index2 > lockFactory=3Dorg.apache.lucene.store.NativeFSLockFactory@39dd3812 > index=3D > version=3D4.0-SNAPSHOT > matchVersion=3DLUCENE_40 > analyzer=3Dorg.pache.soundcloud.wikisearch.Indexer$WikiAnalyzer > delPolicy=3Dorg.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy > commit=3Dnull > openMode=3DCREATE_OR_APPEND > similarityProvider=3Dorg.apache.lucene.search.DefaultSimilarityProvider > termIndexInterval=3D32 > mergeScheduler=3Dorg.apache.lucene.index.ConcurrentMergeScheduler > default WRITE_LOCK_TIMEOUT=3D1000 > writeLockTimeout=3D1000 > maxBufferedDeleteTerms=3D-1 > ramBufferSizeMB=3D512.0 > maxBufferedDocs=3D-1 > mergedSegmentWarmer=3Dnull > codecProvider=3Dorg.apache.lucene.index.codecs.CoreCodecProvider@6a8c436b > mergePolicy=3D[TieredMergePolicy: maxMergeAtOnce=3D10, > maxMergeAtOnceExplicit=3D30, maxMergedSegmentMB=3D5120.0, floorSegmentMB= =3D2.0, > expungeDeletesPctAllowed=3D10.0, segmentsPerTier=3D10.0, useCompoundFile= =3Dtrue, > noCFSRatio=3D0.1 > indexerThreadPool=3Dorg.apache.lucene.index.ThreadAffinityDocumentsWriter= ThreadPool@1e9e5c73 > readerPooling=3Dfalse > readerTermsIndexDivisor=3D1 > flushPolicy=3Dorg.apache.lucene.index.FlushByRamOrCountsPolicy@2ec791b9 > perThreadHardLimitMB=3D1945 > > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Thread-l= ocking-while-merging-ConcurrentMergeScheduler-issue-tp3222427p3223874.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org