I'm planning to work on incorporating Mike's Python scripts into the
Java benchmark code. I'd like to keep track of overall suggestions
for improvements to contrib/benchmark. Perhaps I should open an issue
so people can post suggestions? This way I can look at them and code
them up (as I'll forget otherwise or they'll be lost in the dev
list). Marvin may think of improvements in the midst of porting
(seems he already has).

On Sat, Feb 7, 2009 at 8:00 AM, Michael McCandless <lucene@mikemccandless.com> wrote:

You'll also need at least some of the *QueryMaker under feeds.

You might also want to make an improvment: change the QueryMaker API
to include both the query and the "arrival time" of that query.  And
then fix all ReadTask (and Search*Task) so that queries are executed
at their scheduled time (assuming enough threads & hardware).

This way one could play back a true search log and measure "realistic"
query latencies, or, one could concoct synthetic difficult cases (4
very hard queries suddenly running at once) and understand how
performance degrades.

Another thing I miss (which I've worked around w/ Python scripts on
top) is to be able to save a set of runs, and then use it as a
baseline when comparing to another set of runs, with the ability to
print out resulting tables in Jira's markup.

Mike


Grant Ingersoll wrote:

The build file in the benchmarker has a "run" target that shows how to run it.  The important part to port is the "by task" stuff: http://lucene.apache.org/java/2_4_0/api/contrib-benchmark/org/apache/lucene/benchmark/byTask/package-summary.html




On Feb 6, 2009, at 10:11 PM, Marvin Humphrey wrote:

Greets,

Lucy needs sophisticated search-time benchmarking.  The obvious approach is to
port the Lucene contrib benchmark suite.

However, contrib benchmark has a large number of classes, the documentation is
sparse and occasionally wrong ("Usage: java Benchmark algorithm-file"),
there's no howto or Wiki page (just package.html) ... and one obvious starting
point, the "Benchmarker" class, is deprecated.

What's actually important in the benchmark suite?  Besides "Benchmarker" being
deprecated, there look to be multiple "stats" and "utils" directories. Are
there large chunks of obsolete code that can be safely ignored?

Marvin Humphrey



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika) using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org