From java-dev-return-15788-apmail-lucene-java-dev-archive=lucene.apache.org@lucene.apache.org Fri Sep 22 19:00:31 2006 Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 18070 invoked from network); 22 Sep 2006 19:00:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 22 Sep 2006 19:00:30 -0000 Received: (qmail 52844 invoked by uid 500); 22 Sep 2006 19:00:23 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 52805 invoked by uid 500); 22 Sep 2006 19:00:23 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 52751 invoked by uid 99); 22 Sep 2006 19:00:23 -0000 Received: from idunn.apache.osuosl.org (HELO idunn.apache.osuosl.org) (140.211.166.84) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Sep 2006 12:00:23 -0700 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests= Received: from [209.237.227.198] ([209.237.227.198:34727] helo=brutus.apache.org) by idunn.apache.osuosl.org (ecelerity 2.1.1.8 r(12930)) with ESMTP id BC/F2-06791-0C234154 for ; Fri, 22 Sep 2006 12:00:17 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id A9B287142FE for ; Fri, 22 Sep 2006 18:56:24 +0000 (GMT) Message-ID: <15101919.1158951384692.JavaMail.jira@brutus> Date: Fri, 22 Sep 2006 11:56:24 -0700 (PDT) From: "Dawid Weiss (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene In-Reply-To: <13524679.1158815782415.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/LUCENE-675?page=comments#action_12436972 ] Dawid Weiss commented on LUCENE-675: ------------------------------------ First -- I think it's a good initiative. Grant, when you're thinking about the infrastructure, it would be pretty neat to have a way of logging performance in a way so that one could draw charts from them. You know, for the visual folks :) Anyway, my other idea is that benchmarking Lucene can be performed on two levels: one is the user level, where the entire operation counts (such as indexing, searching etc). Another aspect is measurement of atomic parts _within_ the big operation so that you know how much of the whole thing each subpart takes. I wrote an interesting piece of code once that allows measuring times for named operation (per-thread) in a recursive way. Looks something like this: perfLogger.start("indexing"); try { .. code (with recursion etc) ... perfLogger.start("subpart"); try { } finally { perfLogger.stop(); } } finally { perfLogger.stop(); } in the output you get something like this: indexing: 5 seconds; ->subpart: : 2 seconds; -> ... Of course everything comes at a price and the above logging costs some CPU cycles (my implementation stored a nesting stack in ThreadLocals). One can always put that code in 'if' clauses attached to final variables and enable logging only for benchmarking targets (the compiler will get rid of logging statements then). If folks are interested I can dig out that performance logger and maybe adopt it to what Grant comes up with. > Lucene benchmark: objective performance test for Lucene > ------------------------------------------------------- > > Key: LUCENE-675 > URL: http://issues.apache.org/jira/browse/LUCENE-675 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Andrzej Bialecki > Assigned To: Grant Ingersoll > Attachments: LuceneBenchmark.java > > > We need an objective way to measure the performance of Lucene, both indexing and querying, on a known corpus. This issue is intended to collect comments and patches implementing a suite of such benchmarking tests. > Regarding the corpus: one of the widely used and freely available corpora is the original Reuters collection, available from http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz or http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. I propose to use this corpus as a base for benchmarks. The benchmarking suite could automatically retrieve it from known locations, and cache it locally. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org