lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: TREC Collection, NIST and Lucene
Date Mon, 20 Aug 2007 13:15:18 GMT
How does this sound:

Dear ----,

My name is Grant Ingersoll and I am committer on the Lucene Java  
search library (http://lucene.apache.org) at the Apache Software  
Foundation (ASF).  I am not, however, writing in any official  
capacity as a representative of the ASF.  Perhaps at a later date,  
this will change, but for now I just want to keep things informal.

I am, however, interested in starting a discussion about how open  
source projects like Lucene could participate in future TREC  
evaluations, or at least gain access to TREC data resources.  While  
the people involved in Lucene feel we have built a top notch search  
system, one of the things the community as a whole lacks is the  
ability to do formal evaluations like TREC offers, and thus research  
and development of new algorithms is hindered.  Granted, individuals  
may perform TREC evaluations given they have purchased a license to  
the data, but the community as a whole does not have this ability.

I am wondering if there is some way in which we can arrange for open  
source projects to obtain access to the TREC collections.  The  
biggest barrier for projects like Lucene, obviously, is the fee that  
needs to be paid.  Furthermore, there are undoubtedly distribution  
and copyright concerns.  Yet, a part of me feels that we can work  
something out through creative licensing or some other novel approach  
that protects the appropriate interests, furthers TREC's mission and  
supports the vibrant Open Source community around Lucene and other  
search engines.  Perhaps it would be possible to require that any  
participant who wants the TREC data must prove that they are  
appropriately affiliated with an official open source project, as  
defined by the Open Source Initiative (http://www.opensource.org).   
Many tool vendors have similar licenses that allow open source  
participants to use their tool while working on open source projects 
[1].  Perhaps we could provide a similar approach to the TREC data.

I feel this would benefit TREC substantially, by providing an open,  
baseline system for all the world to see and I see that it fits very  
much with the motto of TREC  "...to encourage research in information  
retrieval from large text collections."   Naturally, it benefits  
Lucene by allowing Lucene to undertake more formal evaluation of  
relevance, etc.

If you are interested in more background on this on the Lucene Java  
developers mailing list, please refer to
http://www.gossamer-threads.com/lists/lucene/java-dev/52022? 
search_string=TREC;#52022

I look forward to hearing back from you and I would be more than  
happy to answer any questions you have.

Sincerely,
Grant Ingersoll

[1] JetBrains, Atlassian, Clover Test Coverage, etc.

-------

-Grant





On Aug 10, 2007, at 4:52 AM, Tom White wrote:

>> Furthermore, I think it would
>> encourage Lucene users/developers to think about relevance as much as
>> we think about speed.
>
> +1
>
> However I think it would be much better to start by making informal
> approaches as you suggest - the open letter seems to me to be
> appropriate only as a last resort.
>
> Tom
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message