lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <>
Subject [jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene
Date Thu, 21 Sep 2006 12:38:25 GMT
    [ ] 
Andrzej Bialecki  commented on LUCENE-675:

The dependency on commons-compress could be avoided - I used this just to be able to unpack
tar.gz files, we can use Ant for that. If you meant the dependency on the corpus - can't Ant
download this too as a dependency?

Re: Project Gutenberg - good point, this is a good source for multi-lingual documents. The
"Europarl" collection is another, although a bit more hefty, so that could be suitable for
running large-scale benchmarks, and texts from Project Gutenberg for running small-scale tests.

> Lucene benchmark: objective performance test for Lucene
> -------------------------------------------------------
>                 Key: LUCENE-675
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Andrzej Bialecki 
>         Attachments:
> We need an objective way to measure the performance of Lucene, both indexing and querying,
on a known corpus. This issue is intended to collect comments and patches implementing a suite
of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is the original
Reuters collection, available from
or I
propose to use this corpus as a base for benchmarks. The benchmarking suite could automatically
retrieve it from known locations, and cache it locally.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message