Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 19918 invoked from network); 19 Oct 2006 07:44:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 19 Oct 2006 07:44:58 -0000 Received: (qmail 62553 invoked by uid 500); 19 Oct 2006 07:44:55 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 62503 invoked by uid 500); 19 Oct 2006 07:44:54 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 62479 invoked by uid 99); 19 Oct 2006 07:44:54 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Oct 2006 00:44:54 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of DORONC@il.ibm.com designates 195.212.29.152 as permitted sender) Received: from [195.212.29.152] (HELO mtagate3.de.ibm.com) (195.212.29.152) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Oct 2006 00:44:53 -0700 Received: from d12nrmr1607.megacenter.de.ibm.com (d12nrmr1607.megacenter.de.ibm.com [9.149.167.49]) by mtagate3.de.ibm.com (8.13.8/8.13.8) with ESMTP id k9J7iUmP072084 for ; Thu, 19 Oct 2006 07:44:30 GMT Received: from d12av04.megacenter.de.ibm.com (d12av04.megacenter.de.ibm.com [9.149.165.229]) by d12nrmr1607.megacenter.de.ibm.com (8.13.6/8.13.6/NCO v8.1.1) with ESMTP id k9J7kwUo2691072 for ; Thu, 19 Oct 2006 09:46:58 +0200 Received: from d12av04.megacenter.de.ibm.com (loopback [127.0.0.1]) by d12av04.megacenter.de.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id k9J7iPFu006291 for ; Thu, 19 Oct 2006 09:44:25 +0200 Received: from d12mc102.megacenter.de.ibm.com (d12mc102.megacenter.de.ibm.com [9.149.167.114]) by d12av04.megacenter.de.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id k9J7iPFH006288 for ; Thu, 19 Oct 2006 09:44:25 +0200 In-Reply-To: Subject: Re: flushRamSegments possible perf improvement? To: java-dev@lucene.apache.org X-Mailer: Lotus Notes Release 7.0 HF277 June 21, 2006 Message-ID: From: Doron Cohen Date: Wed, 18 Oct 2006 23:44:04 -0800 X-MIMETrack: Serialize by Router on D12MC102/12/M/IBM(Release 7.0.1HF269 | June 22, 2006) at 19/10/2006 09:46:58 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Ok, I tested this approach - not a clean code yet, just enough to test if indeed there is potential improvement here, and I think there is. Performance results for the (short) tests I ran on my everyday machine: (read as: [oldTimeMillis] to [newTimeMillis] is [speed-up] for adding: n docs, maxBuffered=x mergeFactor=y) --- "new" runs before "old" --- 3605 to 2964 is 17% for: 500 docs, buf=10 mrg=3 2163 to 1923 is 11% for: 2000 docs, buf=100 mrg=4 6990 to 5759 is 17% for: 8000 docs, buf=200 mrg=5 20529 to 18286 is 10% for: 32000 docs, buf=400 mrg=6 44444 to 39677 is 10% for: 64000 docs, buf=1000 mrg=7 --- "old" runs before "new" --- 3926 to 2434 is 38% for: 500 docs, buf=10 mrg=3 2233 to 1732 is 22% for: 2000 docs, buf=100 mrg=4 6199 to 5678 is 8% for: 8000 docs, buf=200 mrg=5 20139 to 16955 is 15% for: 32000 docs, buf=400 mrg=6 42220 to 39507 is 6% for: 64000 docs, buf=1000 mrg=7 I will submit this in a Jira issue. Thoughts anyone? Any particular other setting you think should be tested? - Doron Doron Cohen/Haifa/IBM@IBMIL wrote on 18/10/2006 15:29:26: > > Currently IndexWriter.flushRamSegments() always merge all ram segments to > disk. Later it may merge more, depending on the maybe-merge algorithm. This > happens at closing the index and when the number of (1 doc) (ram) segments > exceeds max-buffered-docs. > > Can there be a performance penalty for always merging to disk first? > > Assume the following merges take place: > merging segments _ram_0 (1 docs) _ram_1 (1 docs) ... _ram_N (1_docs) into > _a (N docs) > merging segments _6 (M docs) _7 (K docs) _8 (L docs) into _b (N+M+K+L > docs) > > Alternatively, we could tell (compute) that this is going to happen, and > have a single merge: > merging segments _ram_0 (1 docs) _ram_1 (1 docs) ... _ram_N (1_docs) > _6 (M docs) _7 (K docs) _8 (L docs) into _b (N+M+K+L > docs) > > This would save writing the segemnt of size N to disk and reading it again. > For large enough N, Is there really potential save here? > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org