lucene-openrelevance-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Comments on ORP Wiki Additions ?
Date Thu, 11 Feb 2010 21:42:40 GMT
only a partial subset of the docs (some top-N from different submissions)
are placed into a pool and judged.

here is a great little presentation that is very relevant to ORP project, as
i am sure we don't want to create complete judgements, yet we want reusable
evaluation collections:
http://www.ir.uwaterloo.ca/slides/buettcher_reliable_evaluation.pdf

On Thu, Feb 11, 2010 at 4:31 PM, Mark Bennett <mbennett@ideaeng.com> wrote:

> Hi Robert,
>
> By "pooling", you mean they combine different sets of source docs and
> question sets, in kind of a patch work?  If that's what you mean, do you
> know how that process was generally done?  How close to "perfection", ie
> total coverage by humans, do you think they got?
>
> If that's not what you meant by "pooling" then I'm a bit confused...
>
> Thanks,
>
> Mark
>
> --
> Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
> Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
>
>
> On Thu, Feb 11, 2010 at 1:02 PM, Robert Muir <rcmuir@gmail.com> wrote:
>
>> in this case pooling is what is typically used.
>>
>>
>> On Thu, Feb 11, 2010 at 3:49 PM, Mark Bennett <mbennett@ideaeng.com>wrote:
>>
>>> Thanks Robert,
>>>
>>> Excellent comments, I'll try to add something to the outline.  Either a
>>> higher level top section, or some intro text.
>>>
>>> Robert, in particular, I wonder if you could look at:
>>>
>>>
>>> http://cwiki.apache.org/confluence/display/ORP/Relevancy+Assertion+Testing
>>>
>>> In the section on "Full-Grid Assertions (TREC-Style!)"
>>>
>>> It talks about the "M x N" problem of creating relevancy judgment data.
>>> It also explores some of the shortcuts that could be used.
>>>
>>> We're actually working through these problems with a couple clients.  On
>>> the one hand they want "perfect" measurements, but on the other hand nobody
>>> wants to fund the work to create completely curated test sets.  This is the
>>> classic "good vs. cheap" argument, and I DO think there are reasonable
>>> compromises to be had.
>>>
>>> TREC has evolved over the years and I wonder how they've addressed
>>> these.  Did they take any shortcuts?  Or did they get enough manpower to
>>> really curate every single document and relevancy judgment?
>>>
>>> I'll be adding more about some of the compromises we've considered and
>>> worked on, but it'd be great to get other experts to chime in.  Either y'all
>>> will come back with other ideas we didn't think, or we get to say "we told
>>> you so" - I'm happy either way.
>>>
>>> And what I love about the ORP process is that all of this is captured and
>>> vetted in an accessible public forum.  TREC was also peer reviewed, so this
>>> continues that tradition in the newer medium.  And I'll work on an even
>>> clearer outline
>>>
>>>
>>> Mark
>>>
>>> --
>>> Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
>>> Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
>>>
>>>
>>> On Thu, Feb 11, 2010 at 11:49 AM, Robert Muir <rcmuir@gmail.com> wrote:
>>>
>>>> first of all, thanks for adding this content!
>>>>
>>>> in my opinion one thing that might be helpful would be an 'introduction'
>>>> section that is VERY high-level. I don't want to sound negative but your
>>>> 'high level outline' is actually quite technical :)
>>>>
>>>> it might be a good thing for this project if we had some content
>>>> somewhere that explained at a very very high level what this whole relevance
>>>> testing thing is all about...
>>>>
>>>>
>>>> On Thu, Feb 11, 2010 at 12:58 PM, Mark Bennett <mbennett@ideaeng.com>wrote:
>>>>
>>>>> Good morning Relevancy comrades,
>>>>>
>>>>> I've tried to take a stab at outlining this rather complex subject in
>>>>> the wiki.  Of course it's a work in progress.
>>>>>
>>>>> I've done a high level outline here:
>>>>>
>>>>> http://cwiki.apache.org/confluence/display/ORP/Relevancy+Testing+Outline
>>>>>
>>>>> And an expansion of the first section of the outline here:
>>>>>
>>>>> http://cwiki.apache.org/confluence/display/ORP/Relevancy+Assertion+Testing
>>>>>
>>>>> I actually could use some feedback.  I promise you this is not vanity,
>>>>> there are actually some very pragmatic motives for my postings.
>>>>>
>>>>> I guess some specific questions:
>>>>> * I'm trying to create a bit of a "crash course" in Relevancy Testing,
>>>>> are there major areas I've overlooked?
>>>>> * I've outlined 2 broad categories of testing, do you agree?
>>>>> * I've tried to explore some of the high level strengths and drawbacks
>>>>> of certain methodologies
>>>>> * Is the "tone" reasonably neutral?  What I mean is that some folks may
>>>>> be attached to certain methods, I don't want to seem like I'm "trashing"
>>>>> anything, just trying to point out the strengths and weaknesses in a
fair
>>>>> way.
>>>>>
>>>>> I look forward to any comments.
>>>>>
>>>>> Mark
>>>>>
>>>>> --
>>>>> Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
>>>>> Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Robert Muir
>>>> rcmuir@gmail.com
>>>>
>>>
>>>
>>
>>
>> --
>> Robert Muir
>> rcmuir@gmail.com
>>
>
>


-- 
Robert Muir
rcmuir@gmail.com

Mime
View raw message