lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene
Date Fri, 22 Sep 2006 18:15:24 GMT
    [ http://issues.apache.org/jira/browse/LUCENE-675?page=comments#action_12436949 ] 
            
Grant Ingersoll commented on LUCENE-675:
----------------------------------------

My comments are marked by GSI
-----------

In the mean time I've been using Europarl for my testing.

GSI: perhaps you can contribute once this is setup

Also important to realize is there are many dimensions to test. With
lock-less I'm focusing entirely on "wall clock time to open readers
and writers" in different use cases like pure indexing, pure
searching, highly interactive mixed indexing/searching, etc. And this
is actually hard to test cleanly because in certain cases (highly
interactive case, or many readers case), the current Lucene hits many
"commit lock" retries and/or timeouts (whereas lock-less doesn't). So
what's a "fair" comparison in this case?

GSI:  I am planning on taking Andrzej contribution and refactoring it into components that
can be reused, as well as creating a "standard" benchmark which will be easy to run through
a simple ant task, i.e. ant run-baseline

GSI: From here, anybody can contribute their own (I will provide interfaces to facilitate
this) benchmarks which others can choose to run. 


In addition to standardizing on the corpus I think we ideallly need
standardized hardware / OS / software configuration as well, so the
numbers are easily comparable across time. 

GSI: Not really feasible unless you are proposing to buy us machines :-)  I think more important
is the ability to do a before and after evaluation (that runs each test several times) as
you make changes.  Anybody should be able to do the same.  Run the benchmark, apply the patch
and then rerun the benchmark.


> Lucene benchmark: objective performance test for Lucene
> -------------------------------------------------------
>
>                 Key: LUCENE-675
>                 URL: http://issues.apache.org/jira/browse/LUCENE-675
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Andrzej Bialecki 
>         Assigned To: Grant Ingersoll
>         Attachments: LuceneBenchmark.java
>
>
> We need an objective way to measure the performance of Lucene, both indexing and querying,
on a known corpus. This issue is intended to collect comments and patches implementing a suite
of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is the original
Reuters collection, available from http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz
or http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. I
propose to use this corpus as a base for benchmarks. The benchmarking suite could automatically
retrieve it from known locations, and cache it locally.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message