Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 59987 invoked from network); 28 May 2007 12:04:45 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 28 May 2007 12:04:45 -0000 Received: (qmail 54295 invoked by uid 500); 28 May 2007 12:04:44 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 54250 invoked by uid 500); 28 May 2007 12:04:43 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 54239 invoked by uid 99); 28 May 2007 12:04:43 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 May 2007 05:04:43 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 May 2007 05:04:38 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id E9E12714161 for ; Mon, 28 May 2007 05:04:17 -0700 (PDT) Message-ID: <182899.1180353857950.JavaMail.jira@brutus> Date: Mon, 28 May 2007 05:04:17 -0700 (PDT) From: "Michael McCandless (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-888) Improve indexing performance by increasing internal buffer sizes In-Reply-To: <28979694.1179926596715.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499548 ] Michael McCandless commented on LUCENE-888: ------------------------------------------- I re-ran the "second test" above, but this time with compound file turned off. Baseline (trunk) = 1 K buffers for all 3. New = 16 K for BufferedIndexOutput, 16 K for CompoundFileWriter and 4 K for BufferedIndexInput. I ran each test 4 times & took the best time. Quad core Mac OS X on 4-drive RAID 0 baseline 553 sec new 499 sec -> 10% faster Dual core Debian Linux (2.6.18 kernel) on 6 drive RAID 5 baseline 590 sec new 548 sec -> 7% faster Windows XP Pro laptop, single drive baseline 1078 sec new 918 sec -> 15% faster Quick observations: * Still a healthy 7-15% overall gain even without compound file by increasing the buffer sizes. * The overall performance gain on the trunk just by turning off compound file ranges from 7-33% (33% gain = the Windows XP Laptop). OK I plan to commit this soon. > Improve indexing performance by increasing internal buffer sizes > ---------------------------------------------------------------- > > Key: LUCENE-888 > URL: https://issues.apache.org/jira/browse/LUCENE-888 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: 2.1 > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Attachments: LUCENE-888.patch, LUCENE-888.take2.patch > > > In working on LUCENE-843, I noticed that two buffer sizes have a > substantial impact on overall indexing performance. > First is BufferedIndexOutput.BUFFER_SIZE (also used by > BufferedIndexInput). Second is CompoundFileWriter's buffer used to > actually build the compound file. Both are now 1 KB (1024 bytes). > I ran the same indexing test I'm using for LUCENE-843. I'm indexing > ~5,500 byte plain text docs derived from the Europarl corpus > (English). I index 200,000 docs with compound file enabled and term > vector positions & offsets stored plus stored fields. I flush > documents at 16 MB RAM usage, and I set maxBufferedDocs carefully to > not hit LUCENE-845. The resulting index is 1.7 GB. The index is not > optimized in the end and I left mergeFactor @ 10. > I ran the tests on a quad-core OS X 10 machine with 4-drive RAID 0 IO > system. > At 1 KB (current Lucene trunk) it takes 622 sec to build the index; if > I increase both buffers to 8 KB it takes 554 sec to build the index, > which is an 11% overall gain! > I will run more tests to see if there is a natural knee in the curve > (buffer size above which we don't really gain much more performance). > I'm guessing we should leave BufferedIndexInput's default BUFFER_SIZE > at 1024, at least for now. During searching there can be quite a few > of this class instantiated, and likely a larger buffer size for the > freq/prox streams could actually hurt search performance for those > searches that use skipping. > The CompoundFileWriter buffer is created only briefly, so I think we > can use a fairly large (32 KB?) buffer there. And there should not be > too many BufferedIndexOutputs alive at once so I think a large-ish > buffer (16 KB?) should be OK. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org