Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 18442 invoked from network); 3 Apr 2007 14:54:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Apr 2007 14:54:54 -0000 Received: (qmail 51417 invoked by uid 500); 3 Apr 2007 14:54:58 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 51372 invoked by uid 500); 3 Apr 2007 14:54:58 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 51361 invoked by uid 99); 3 Apr 2007 14:54:58 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Apr 2007 07:54:58 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of yseeley@gmail.com designates 66.249.92.173 as permitted sender) Received: from [66.249.92.173] (HELO ug-out-1314.google.com) (66.249.92.173) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Apr 2007 07:54:50 -0700 Received: by ug-out-1314.google.com with SMTP id k40so328317ugc for ; Tue, 03 Apr 2007 07:54:24 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition:x-google-sender-auth; b=NgOluQ5wR6kLKjlgv2a6Gz+nNKqWpdW1pXPpG+sn5GqfSKdT+RfpI3xT6hn3DTEKzzyICp7Um1Yd4lFtey1Pg1TehL07GCjfl69nxlkJzngC2dkLrDpkRsxJbmgS1F8yY49k8Dzb034IBkPL3XxybgY6AAvGU24oTUgnWxFQ5XI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition:x-google-sender-auth; b=r3cNDsrsaa4jR73jEMrHuRkhhrJCZHw6lJta1G8bY1Do3MMpH3f0D6oxGViqdXgeNQAt3o8JaqdcOmehHiOrX3B3u9gIXLVznoN+OBtgYf1YgUQfV2LPxQWlnrBmX3Iso+MXTxQS+kwx5DOf5PrhBBnGhteaa7I7LL2o4ALfr+E= Received: by 10.82.153.5 with SMTP id a5mr9807896bue.1175612064604; Tue, 03 Apr 2007 07:54:24 -0700 (PDT) Received: by 10.82.126.8 with HTTP; Tue, 3 Apr 2007 07:54:24 -0700 (PDT) Message-ID: Date: Tue, 3 Apr 2007 10:54:24 -0400 From: "Yonik Seeley" Sender: yseeley@gmail.com To: java-dev@lucene.apache.org Subject: Re: improve how IndexWriter uses RAM to buffer added documents MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Google-Sender-Auth: 94937914c06827ec X-Virus-Checked: Checked by ClamAV on apache.org Wow, very nice results Mike! -Yonik On 4/3/07, Michael McCandless (JIRA) wrote: > > [ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486335 ] > > Michael McCandless commented on LUCENE-843: > ------------------------------------------- > > > Last is the results for small docs (100 tokens = ~550 bytes plain text each): > > 2000000 DOCS @ ~550 bytes plain text > RAM = 32 MB > NUM THREADS = 1 > MERGE FACTOR = 10 > > > No term vectors nor stored fields > > AUTOCOMMIT = true (commit whenever RAM is full) > > old > 2000000 docs in 886.7 secs > index size = 438M > > new > 2000000 docs in 230.5 secs > index size = 435M > > Total Docs/sec: old 2255.6; new 8676.4 [ 284.7% faster] > Docs/MB @ flush: old 128.0; new 4194.6 [ 3176.2% more] > Avg RAM used (MB) @ flush: old 107.3; new 37.7 [ 64.9% less] > > > AUTOCOMMIT = false (commit only once at the end) > > old > 2000000 docs in 888.7 secs > index size = 438M > > new > 2000000 docs in 239.6 secs > index size = 432M > > Total Docs/sec: old 2250.5; new 8348.7 [ 271.0% faster] > Docs/MB @ flush: old 128.0; new 4146.8 [ 3138.9% more] > Avg RAM used (MB) @ flush: old 108.1; new 38.9 [ 64.0% less] > > > > With term vectors (positions + offsets) and 2 small stored fields > > AUTOCOMMIT = true (commit whenever RAM is full) > > old > 2000000 docs in 1480.1 secs > index size = 2.1G > > new > 2000000 docs in 462.0 secs > index size = 2.1G > > Total Docs/sec: old 1351.2; new 4329.3 [ 220.4% faster] > Docs/MB @ flush: old 93.1; new 4194.6 [ 4405.7% more] > Avg RAM used (MB) @ flush: old 296.4; new 38.3 [ 87.1% less] > > > AUTOCOMMIT = false (commit only once at the end) > > old > 2000000 docs in 1489.4 secs > index size = 2.1G > > new > 2000000 docs in 347.9 secs > index size = 2.1G > > Total Docs/sec: old 1342.8; new 5749.4 [ 328.2% faster] > Docs/MB @ flush: old 93.1; new 4146.8 [ 4354.5% more] > Avg RAM used (MB) @ flush: old 297.1; new 38.6 [ 87.0% less] > > > > 200000 DOCS @ ~5,500 bytes plain text > > > No term vectors nor stored fields > > AUTOCOMMIT = true (commit whenever RAM is full) > > old > 200000 docs in 397.6 secs > index size = 415M > > new > 200000 docs in 167.5 secs > index size = 411M > > Total Docs/sec: old 503.1; new 1194.1 [ 137.3% faster] > Docs/MB @ flush: old 81.6; new 406.2 [ 397.6% more] > Avg RAM used (MB) @ flush: old 87.3; new 35.2 [ 59.7% less] > > > AUTOCOMMIT = false (commit only once at the end) > > old > 200000 docs in 394.6 secs > index size = 415M > > new > 200000 docs in 168.4 secs > index size = 408M > > Total Docs/sec: old 506.9; new 1187.7 [ 134.3% faster] > Docs/MB @ flush: old 81.6; new 432.2 [ 429.4% more] > Avg RAM used (MB) @ flush: old 126.6; new 36.9 [ 70.8% less] > > > > With term vectors (positions + offsets) and 2 small stored fields > > AUTOCOMMIT = true (commit whenever RAM is full) > > old > 200000 docs in 754.2 secs > index size = 1.7G > > new > 200000 docs in 304.9 secs > index size = 1.7G > > Total Docs/sec: old 265.2; new 656.0 [ 147.4% faster] > Docs/MB @ flush: old 46.7; new 406.2 [ 769.6% more] > Avg RAM used (MB) @ flush: old 92.9; new 35.2 [ 62.1% less] > > > AUTOCOMMIT = false (commit only once at the end) > > old > 200000 docs in 743.9 secs > index size = 1.7G > > new > 200000 docs in 244.3 secs > index size = 1.7G > > Total Docs/sec: old 268.9; new 818.7 [ 204.5% faster] > Docs/MB @ flush: old 46.7; new 432.2 [ 825.2% more] > Avg RAM used (MB) @ flush: old 93.0; new 36.6 [ 60.6% less] --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org