lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Klaas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene
Date Fri, 22 Sep 2006 17:31:23 GMT
    [ http://issues.apache.org/jira/browse/LUCENE-675?page=comments#action_12436934 ] 
            
Mike Klaas commented on LUCENE-675:
-----------------------------------

A few notes on benchmarks:

First, it is important to realize that no benchmark will ever fully-capture all aspects of
lucene performance, particularly since so many real-world data distributions are so varied.
 That said, they are useful tools, especially if they are componentized to measure various
aspects of lucene performance (the narrower the goal of the benchmark it, the better a benchmark
can be created).

It is rather unrealistic to expect to standardize hardware / os ... better to compare before/after
numbers on a single configuration, rather than comparing the numbers among configurations.
 The test process _is_ important, but anything crucial should be built into the test (like
the number of iterations; taking the average, etc).  Concerning the specifics of this: Requiring
reboots is onerous and not an important criterion (at least for unix systems--I'm not sufficiently
familiar with windows to comment).  Better to stipulate a relatively quiscient machine.  Or
perhaps not--it might be useful to see how the machine load affects lucene performance.  Also,
the arithmetic mean is a terrible way of combining results due to its emphasis on outliers.
 Better is the average over minimum times of small sets of runs.  

Of course, any scheme has its problems.  In general, the most important thing when using benchmarks
is being aware of the limitations of the benchmark and methodology used.

> Lucene benchmark: objective performance test for Lucene
> -------------------------------------------------------
>
>                 Key: LUCENE-675
>                 URL: http://issues.apache.org/jira/browse/LUCENE-675
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Andrzej Bialecki 
>         Assigned To: Grant Ingersoll
>         Attachments: LuceneBenchmark.java
>
>
> We need an objective way to measure the performance of Lucene, both indexing and querying,
on a known corpus. This issue is intended to collect comments and patches implementing a suite
of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is the original
Reuters collection, available from http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz
or http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. I
propose to use this corpus as a base for benchmarks. The benchmarking suite could automatically
retrieve it from known locations, and cache it locally.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message