lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene
Date Sun, 12 Nov 2006 10:02:39 GMT
    [ http://issues.apache.org/jira/browse/LUCENE-675?page=comments#action_12449117 ] 
            
Doron Cohen commented on LUCENE-675:
------------------------------------

I looked at extending the benchmark with:
- different test "scenarios", i.e. other sequences of operations.
- multithreaded tests, e.g. several queries in parallel.
- rate of events, e.g. "2 queries arriving per second", or "one query per second in parallel
with 20 new documents in a minute".
- different data sources (input documents, queries).

For this I made lots of changes to the benchmark code, using parts of it and rewriting other
parts. 
I would like to submit this code in a few days - it is running already but some functionality
is missing.

I would like to describe how it works to hopefully get early feedback. 

There are several "basic tasks" defined - all extending an (abstract) class PerfTask:
- AddDocTask
- OptimizeTask
- CreateIndexTask
etc. 

To further extend the benchmark 'framework', new tasks can be added. Each task must implement
the abstract method: doLogic(). For instance, in AddDocTask this method (doLogic) would call
indexWriter.addDocument().
There are also setup() and tearDown() methods for performing work that should not be timed
for that task. 

A special TaskSequence task contains other tasks. It is either parallel or sequential, which
tells if it executes its child tasks serially or in parallel. 
TaskSequence also supports "rate": the pace in which its child tasks are "fired" can be controlled.

With these tasks, it is possible to describe a performance test 'algorithm' in a simple syntax.
('algorithm' may be too big a word for this...?)

A test invocation takes two parameters: 
- test.properties - file with various config properties.
- test.alg               - file with the algorithm.

By convention, for each task class  "OpNameTask",  the command  "OpName"  is valid in test.alg.

Adding a single document is done by:
    AddDoc

Adding 3 documents:
   AddDoc
   AddDoc
   AddDoc

Or, alternatively:
   { AddDoc } : 3

So, '{' and '}' indicate a serial sequence of (child) tasks. 

To fire 100 queries in a row:
  { Search } : 100

To fire 100 queries in parallel:
  [ Search ] : 100

So, '[' and ']' indicate a parallel group of tasks. 

To fire 100 queries in a row, 2 queries per second (120 per minute):
  { Search } : 100 : 120

Similar, but in parallel:
  [ Search ] : 100 : 120

A sequence task can be named for identifying it in reports:
  { "QueriesA" Search } : 100 : 120

And there are tasks that create reports. 

There are more tasks, and more to tell on the alg syntax, but this post is already long..

I find this quite powerful for perf testing.
What do you (and you) think?

- Doron


> Lucene benchmark: objective performance test for Lucene
> -------------------------------------------------------
>
>                 Key: LUCENE-675
>                 URL: http://issues.apache.org/jira/browse/LUCENE-675
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Andrzej Bialecki 
>         Assigned To: Grant Ingersoll
>         Attachments: benchmark.patch, BenchmarkingIndexer.pm, extract_reuters.plx, LuceneBenchmark.java,
LuceneIndexer.java, timedata.zip
>
>
> We need an objective way to measure the performance of Lucene, both indexing and querying,
on a known corpus. This issue is intended to collect comments and patches implementing a suite
of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is the original
Reuters collection, available from http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz
or http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. I
propose to use this corpus as a base for benchmarks. The benchmarking suite could automatically
retrieve it from known locations, and cache it locally.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message