lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-836) Benchmarks Enhancements (precision/recall, TREC, Wikipedia)
Date Fri, 27 Jul 2007 10:59:18 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516004
] 

Grant Ingersoll commented on LUCENE-836:
----------------------------------------

+1

Applies clean and I like the API, but I think you should have a Jury object too...  

I can't actually run it w/o TREC but the tests pass.  I think I might have TREC Arabic lying
around somewhere, maybe I will give a run w/ that some day, but don't wait on me to apply
this.

> Benchmarks Enhancements (precision/recall, TREC, Wikipedia)
> -----------------------------------------------------------
>
>                 Key: LUCENE-836
>                 URL: https://issues.apache.org/jira/browse/LUCENE-836
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Other
>            Reporter: Grant Ingersoll
>            Priority: Minor
>         Attachments: lucene-836.benchmark.quality.patch, lucene-836.benchmark.quality.patch,
lucene-836.benchmark.quality.patch
>
>
> Would be great if the benchmark contrib had a way of providing precision/recall benchmark
information ala TREC.  I don't know what the copyright issues are for the TREC queries/data
(I think the queries are available, but not sure about the data), so not sure if the is even
feasible, but I could imagine we could at least incorporate support for it for those who have
access to the data.  It has been a long time since I have participated in TREC, so perhaps
someone more familiar w/ the latest can fill in the blanks here.
> Another option is to ask for volunteers to create queries and make judgments for the
Reuters data, but that is a bit more complex and probably not necessary.  Even so, an Apache
licensed set of benchmarks may be useful for the community as a whole.  Hmmm.... 
> Wikipedia might be another option instead of Reuters to setup as a download for benchmarking,
as it is quite large and I believe the licensing terms are quite amenable.  Having a larger
collection would be good for stressing Lucene more and would give many users a demonstration
of how Lucene handles large collections.
> At any rate, this kind of information could be useful for people looking at different
indexing schemes, formats, payloads and different query strategies.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message