lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene
Date Tue, 07 Nov 2006 13:04:53 GMT
    [ http://issues.apache.org/jira/browse/LUCENE-675?page=comments#action_12447781 ] 
            
Grant Ingersoll commented on LUCENE-675:
----------------------------------------

1st run downloaded the documents from the Web before starting to index. 
2nd run started right off - as input docs are already in place - great. 

Seems the only output is what is printed to stdout, right? 


GSI: The Benchmarker interface does return the TimeData, so other implementations, etc. could
use the results programmatically.



I like much the logic of loading test data from the Web, and the scaleUp and maximumDocumentsToIndex
params are handy. 

It seems that all the test logic and some of its data (queries) are java coded. I initially
thought of a setting where we define tasks/jobs that are parameterized, like:

- createIndex(params)
- writeToIndex(params):
  - addDocs()
  - optimize()
- readFromIndex(params):
  - searchIndex()
  - fetchData()


GSI: I definitely agree that we want a more flexible one to meet people's benchmarking needs.
 I wanted at least one test that is "standard" in that you can't change the parameters and
test cases, so that we can all be on the same page on a run.  Then, when people are having
discussions on performance they can say "I ran the standard benchmark before and after and
here are the results" and we all know what they are talking about.  I think all the components
are there for a parameterized version, all it takes is someone to extend the Standard one
or implement there own that reads in a config file.  I will try to put in a fully parameterized
version soon.  


GSI: Thanks for the fixes, I will incorporate into my version and post another patch soon.

> Lucene benchmark: objective performance test for Lucene
> -------------------------------------------------------
>
>                 Key: LUCENE-675
>                 URL: http://issues.apache.org/jira/browse/LUCENE-675
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Andrzej Bialecki 
>         Assigned To: Grant Ingersoll
>         Attachments: benchmark.patch, BenchmarkingIndexer.pm, extract_reuters.plx, LuceneBenchmark.java,
LuceneIndexer.java, timedata.zip
>
>
> We need an objective way to measure the performance of Lucene, both indexing and querying,
on a known corpus. This issue is intended to collect comments and patches implementing a suite
of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is the original
Reuters collection, available from http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz
or http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. I
propose to use this corpus as a base for benchmarks. The benchmarking suite could automatically
retrieve it from known locations, and cache it locally.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message