Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 88572 invoked from network); 17 May 2008 16:52:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 17 May 2008 16:52:22 -0000 Received: (qmail 48563 invoked by uid 500); 17 May 2008 16:52:18 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 48515 invoked by uid 500); 17 May 2008 16:52:18 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 48504 invoked by uid 99); 17 May 2008 16:52:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 17 May 2008 09:52:18 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 17 May 2008 16:51:40 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id DFF90234C117 for ; Sat, 17 May 2008 09:51:55 -0700 (PDT) Message-ID: <1753829384.1211043115916.JavaMail.jira@brutus> Date: Sat, 17 May 2008 09:51:55 -0700 (PDT) From: "Michael McCandless (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Resolved: (LUCENE-1283) Factor out ByteSliceWriter from DocumentsWriterFieldData In-Reply-To: <2009114436.1210415755798.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1283. ---------------------------------------- Resolution: Fixed > Factor out ByteSliceWriter from DocumentsWriterFieldData > -------------------------------------------------------- > > Key: LUCENE-1283 > URL: https://issues.apache.org/jira/browse/LUCENE-1283 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: 2.3, 2.3.1 > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Fix For: 2.4 > > Attachments: LUCENE-1283.patch > > > DocumentsWriter uses byte slices into shared byte[]'s to hold the > growing postings data for many different terms in memory. This is > probably the trickiest (most confusing) part of DocumentsWriter. > Right now it's not cleanly factored out and not easy to separately > test. In working on this issue: > http://mail-archives.apache.org/mod_mbox/lucene-java-user/200805.mbox/%3c126142c0805061426n1168421ya5594ef854fae5e4@mail.gmail.com%3e > which eventually turned out to be a bug in Oracle JRE's JIT compiler, > I factored out ByteSliceWriter and created a unit test to stress test > the writing & reading of byte slices. The test just randomly writes N > streams interleaved into shared byte[]'s, then reads them back > verifying the results are correct. > I created the stress test to try to find any bugs in that code. The > test ran fine (no bugs were found) but I think the refactoring is > still very much worthwhile. > I expected the changes to reduce indexing throughput, so I ran a test > indexing first 200K Wikipedia docs using this alg: > {code} > analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer > doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker > docs.file=/Volumes/External/lucene/wiki.txt > doc.stored = true > doc.term.vector = true > doc.add.log.step=2000 > directory=FSDirectory > autocommit=false > compound=true > ram.flush.mb=256 > { "Rounds" > ResetSystemErase > { "BuildIndex" > - CreateIndex > { "AddDocs" AddDoc > : 200000 > - CloseIndex > } > NewRound > } : 4 > RepSumByPrefRound BuildIndex > {code} > Ok trunk it produces these results: > {code} > Operation round runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem > BuildIndex 0 1 200000 791.7 252.63 338,552,096 1,061,814,272 > BuildIndex - - 1 - - 1 - - 200000 - - 793.1 - - 252.18 - 605,262,080 1,061,814,272 > BuildIndex 2 1 200000 794.8 251.63 601,966,528 1,061,814,272 > BuildIndex - - 3 - - 1 - - 200000 - - 782.5 - - 255.58 - 608,699,712 1,061,814,272 > {code} > and with the patch: > {code} > Operation round runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem > BuildIndex 0 1 200000 745.0 268.47 338,318,784 1,061,814,272 > BuildIndex - - 1 - - 1 - - 200000 - - 792.7 - - 252.30 - 605,331,776 1,061,814,272 > BuildIndex 2 1 200000 786.7 254.24 602,915,712 1,061,814,272 > BuildIndex - - 3 - - 1 - - 200000 - - 795.3 - - 251.48 - 602,378,624 1,061,814,272 > {code} > So it looks like the performance cost of this change is negligible (in > the noise). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org