lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-675) Lucene benchmark: objective performance test for Lucene
Date Thu, 16 Nov 2006 20:17:40 GMT
     [ http://issues.apache.org/jira/browse/LUCENE-675?page=all ]

Doron Cohen updated LUCENE-675:
-------------------------------

    Attachment: benchmark.byTask.patch

I am attaching benchmark.byTask.patch - to be applied in the contrib/benchmark directory.


Root package of byTask classes was modified to org.apache.lucene.benchmark.byTask, in the
lines of Grant's suggestion - seems better cause it keeps all benchmark classes under 
lucene.benchmark.

I added one a sample .alg under conf and added some documentation. 

Entry point - documentation wise - is the package doc for org.apache.lucene.benchmark.byTask.

Thanks for any comments on this!

PS. Before submitting the patch file, I tried to apply it myself on a clean version of the
code, just to make sure that it works. But I got errors like this -- Could not retrieve revision
0 of "...\byTask\.." -- for every file under a new folder. So I am not sure if it is just
my (Windows) svn patch applying utility, or is it really impossible to apply a patch that
creates files in (yet) nonexistent directories.  I searched Lucene mailing lists and SVN mailing
lists and went again through the SVN book again but nowhere could I find what is the expected
behavior for applying a patch containing new directories. In fact, "svn diff" would not even
show you files that are new (again, this is the Windows svn 1.4.2 version). (I used Tortoise
SVN to create the patch). This is rather annoying and I might be misunderstanding something
basic about SVN, but I thought it'd be better to share this experience here - might save some
time for others trying to apply this patch or other patches
 ...

> Lucene benchmark: objective performance test for Lucene
> -------------------------------------------------------
>
>                 Key: LUCENE-675
>                 URL: http://issues.apache.org/jira/browse/LUCENE-675
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Andrzej Bialecki 
>         Assigned To: Grant Ingersoll
>         Attachments: benchmark.byTask.patch, benchmark.patch, BenchmarkingIndexer.pm,
extract_reuters.plx, LuceneBenchmark.java, LuceneIndexer.java, taskBenchmark.zip, timedata.zip,
tiny.alg, tiny.properties
>
>
> We need an objective way to measure the performance of Lucene, both indexing and querying,
on a known corpus. This issue is intended to collect comments and patches implementing a suite
of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is the original
Reuters collection, available from http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz
or http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. I
propose to use this corpus as a base for benchmarks. The benchmarking suite could automatically
retrieve it from known locations, and cache it locally.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message