lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: TREC Collection, NIST and Lucene
Date Mon, 20 Aug 2007 21:18:06 GMT

On Aug 20, 2007, at 3:01 PM, Dawid Weiss wrote:

> I like it too. And I'm wondering what the response to this will be  
> -- it will in a way show if TREC really stands up to their mission,  
> won't it?

Obviously, I hope the result is positive, but I don't know if  
precluding open source is against their mission.  After all, I do  
understand that it takes time and money to create these collections  
and that should be valued.  By the same token, it is the Federal  
government that is sponsoring the competition and doing a lot of the  
work, so it seems one could argue for making a collection that is  
free from copyright restrictions so that anyone can participate.  It  
is a tricky issue, however, so we can hope for the best.

I will wait a couple more days to solicit comments and then submit it  
to the appropriate people at NIST.


> D.
> Grant Ingersoll wrote:
>> How does this sound:
>> Dear ----,
>> My name is Grant Ingersoll and I am committer on the Lucene Java  
>> search library ( at the Apache Software  
>> Foundation (ASF).  I am not, however, writing in any official  
>> capacity as a representative of the ASF.  Perhaps at a later date,  
>> this will change, but for now I just want to keep things informal.
>> I am, however, interested in starting a discussion about how open  
>> source projects like Lucene could participate in future TREC  
>> evaluations, or at least gain access to TREC data resources.   
>> While the people involved in Lucene feel we have built a top notch  
>> search system, one of the things the community as a whole lacks is  
>> the ability to do formal evaluations like TREC offers, and thus  
>> research and development of new algorithms is hindered.  Granted,  
>> individuals may perform TREC evaluations given they have purchased  
>> a license to the data, but the community as a whole does not have  
>> this ability.
>> I am wondering if there is some way in which we can arrange for  
>> open source projects to obtain access to the TREC collections.   
>> The biggest barrier for projects like Lucene, obviously, is the  
>> fee that needs to be paid.  Furthermore, there are undoubtedly  
>> distribution and copyright concerns.  Yet, a part of me feels that  
>> we can work something out through creative licensing or some other  
>> novel approach that protects the appropriate interests, furthers  
>> TREC's mission and supports the vibrant Open Source community  
>> around Lucene and other search engines.  Perhaps it would be  
>> possible to require that any participant who wants the TREC data  
>> must prove that they are appropriately affiliated with an official  
>> open source project, as defined by the Open Source Initiative  
>> (  Many tool vendors have similar  
>> licenses that allow open source participants to use their tool  
>> while working on open source projects[1].  Perhaps we could  
>> provide a similar approach to the TREC data.
>> I feel this would benefit TREC substantially, by providing an  
>> open, baseline system for all the world to see and I see that it  
>> fits very much with the motto of TREC  " encourage research  
>> in information retrieval from large text collections."    
>> Naturally, it benefits Lucene by allowing Lucene to undertake more  
>> formal evaluation of relevance, etc.
>> If you are interested in more background on this on the Lucene  
>> Java developers mailing list, please refer to
>> search_string=TREC;#52022 I look forward to hearing back from you  
>> and I would be more than happy to answer any questions you have.
>> Sincerely,
>> Grant Ingersoll
>> [1] JetBrains, Atlassian, Clover Test Coverage, etc.
>> -------
>> -Grant
>> On Aug 10, 2007, at 4:52 AM, Tom White wrote:
>>>> Furthermore, I think it would
>>>> encourage Lucene users/developers to think about relevance as  
>>>> much as
>>>> we think about speed.
>>> +1
>>> However I think it would be much better to start by making informal
>>> approaches as you suggest - the open letter seems to me to be
>>> appropriate only as a last resort.
>>> Tom
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Grant Ingersoll

Lucene Helpful Hints:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message