lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene
Date Mon, 09 Oct 2006 22:10:22 GMT
    [ http://issues.apache.org/jira/browse/LUCENE-675?page=comments#action_12440990 ] 
            
Grant Ingersoll commented on LUCENE-675:
----------------------------------------

OK, I have a preliminary implementation based on adapting Andrzej's approach.  The interesting
thing about this approach, is it is easy to adapt to be more or less exhaustive (i.e. how
many of the parameters does one wish to have the system alter as it runs)  Thus, you can have
it change the merge factors, max buffered docs, number of documents indexed, number of different
queries run, etc.  The tradeoff, of course, is the length of time it takes to run these.

So my question to those interested, is what is a good baseline running time for testing in
a standard way?  My initial thought is to have something that takes between 15-30 minutes
to run, but I am not sure on this.  Another approach would be to have three "baselines": 
1. quick validation (5 minutes to run...) 2. standard (15-45) 3. exhaustive (1-10 hours).
 

I know several others have built benchmarking suites for their internal use, what has been
your strategy? 

Thoughts, ideas, insights?

Thanks,
Grant

> Lucene benchmark: objective performance test for Lucene
> -------------------------------------------------------
>
>                 Key: LUCENE-675
>                 URL: http://issues.apache.org/jira/browse/LUCENE-675
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Andrzej Bialecki 
>         Assigned To: Grant Ingersoll
>         Attachments: LuceneBenchmark.java
>
>
> We need an objective way to measure the performance of Lucene, both indexing and querying,
on a known corpus. This issue is intended to collect comments and patches implementing a suite
of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is the original
Reuters collection, available from http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz
or http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. I
propose to use this corpus as a base for benchmarks. The benchmarking suite could automatically
retrieve it from known locations, and cache it locally.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message