Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 57992 invoked from network); 2 Apr 2007 21:59:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 2 Apr 2007 21:59:37 -0000 Received: (qmail 31369 invoked by uid 500); 2 Apr 2007 21:59:43 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 30715 invoked by uid 500); 2 Apr 2007 21:59:42 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 30704 invoked by uid 99); 2 Apr 2007 21:59:42 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Apr 2007 14:59:42 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy) Received: from [68.116.38.223] (HELO rectangular.com) (68.116.38.223) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Apr 2007 14:59:34 -0700 Received: from [67.189.26.9] (helo=[10.0.1.3]) by rectangular.com with esmtpa (Exim 4.44) id 1HYUgj-000JEt-BK for java-dev@lucene.apache.org; Mon, 02 Apr 2007 15:07:45 -0700 Mime-Version: 1.0 (Apple Message framework v752.2) In-Reply-To: <392521EA2692A2418DF48C331E61E32506ACEC@professorville.windows.esseff.org> References: <24021458.1175101765409.JavaMail.jira@brutus> <3F3C7FB7-865B-4CD7-AA5F-CA22657E0CE1@gmail.com> <392521EA2692A2418DF48C331E61E32506ACE7@professorville.windows.esseff.org> <392521EA2692A2418DF48C331E61E32506ACEC@professorville.windows.esseff.org> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Marvin Humphrey Subject: Re: [jira] Updated: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff Date: Mon, 2 Apr 2007 14:59:09 -0700 To: java-dev@lucene.apache.org X-Mailer: Apple Mail (2.752.2) X-Virus-Checked: Checked by ClamAV on apache.org On Apr 2, 2007, at 2:50 PM, Steven Parkes wrote: > On the one hand, creating separate per-article files is "clean" in > that > when you then ingest, you only have disk i/o that's going to affect > the > ingest performance (as opposed to, say, uncompressing/parsing). On the > other hand, that's a lot of disk i/o (compresses by about 5X) and a > lot > of directory lookups. One reason I was expanding the elements into individual files was so that I could compare different libraries against Lucene, including those in other languages. It was important to measure the engines themselves, not SGML parsers. Marvin Humphrey Rectangular Research http://www.rectangular.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org