lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen (JIRA)" <>
Subject [jira] Updated: (LUCENE-836) Benchmarks Enhancements (precision/recall, TREC, Wikipedia)
Date Fri, 29 Jun 2007 21:37:06 GMT


Doron Cohen updated LUCENE-836:

    Attachment: lucene-836.benchmark.quality.patch

lucene-836.benchmark.quality.patch adds a new package "quality" under o.a.l.benchmark. 

This is also followup to some of

Patch is based on trunk folder. 
Fastest way to test it: "ant test" from contrib/benchmark dir.
To see more output in this run, try "ant test -Dtests.verbose=true".

This is early code, not ready to commit - wanted to show it sooner for feedback, especially
the API. 

For a quick view of the API see benchmark.quality at
(note that not much javadocs yet - I would wait with that for API closure.)

Code in this patch is:
  - extendable.
  - can run a quality benchmark.
  - report quality results, comparing to given judgements (optional).
  - create a submission log (optional).
  - format of submission log can be modified, by extending a logger class.
  - format of inputs - queries, judgments - can be modified, by extending 
    default readers, or by providing pre-read ones.

There is a general "Judge" interface - answering if a given doc name is valid for a given
"QualityQuery". And one implementation of it, based on Trec's QRels. The alternative of TRels,
for instance, would mean another implementation of the "Judge" interface. (I would love a
better name for it, btw...)

A new TestQualityRun tests this package on the Reuters collection - so that test source is
a good place to start, to see how to run a quality test.

> Benchmarks Enhancements (precision/recall, TREC, Wikipedia)
> -----------------------------------------------------------
>                 Key: LUCENE-836
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Other
>            Reporter: Grant Ingersoll
>            Priority: Minor
>         Attachments: lucene-836.benchmark.quality.patch
> Would be great if the benchmark contrib had a way of providing precision/recall benchmark
information ala TREC.  I don't know what the copyright issues are for the TREC queries/data
(I think the queries are available, but not sure about the data), so not sure if the is even
feasible, but I could imagine we could at least incorporate support for it for those who have
access to the data.  It has been a long time since I have participated in TREC, so perhaps
someone more familiar w/ the latest can fill in the blanks here.
> Another option is to ask for volunteers to create queries and make judgments for the
Reuters data, but that is a bit more complex and probably not necessary.  Even so, an Apache
licensed set of benchmarks may be useful for the community as a whole.  Hmmm.... 
> Wikipedia might be another option instead of Reuters to setup as a download for benchmarking,
as it is quite large and I believe the licensing terms are quite amenable.  Having a larger
collection would be good for stressing Lucene more and would give many users a demonstration
of how Lucene handles large collections.
> At any rate, this kind of information could be useful for people looking at different
indexing schemes, formats, payloads and different query strategies.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message