lucene-openrelevance-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Bennett <mbenn...@ideaeng.com>
Subject Re: Comments on ORP Wiki Additions ?
Date Thu, 11 Feb 2010 20:49:43 GMT
Thanks Robert,

Excellent comments, I'll try to add something to the outline.  Either a
higher level top section, or some intro text.

Robert, in particular, I wonder if you could look at:
http://cwiki.apache.org/confluence/display/ORP/Relevancy+Assertion+Testing

In the section on "Full-Grid Assertions (TREC-Style!)"

It talks about the "M x N" problem of creating relevancy judgment data.  It
also explores some of the shortcuts that could be used.

We're actually working through these problems with a couple clients.  On the
one hand they want "perfect" measurements, but on the other hand nobody
wants to fund the work to create completely curated test sets.  This is the
classic "good vs. cheap" argument, and I DO think there are reasonable
compromises to be had.

TREC has evolved over the years and I wonder how they've addressed these.
Did they take any shortcuts?  Or did they get enough manpower to really
curate every single document and relevancy judgment?

I'll be adding more about some of the compromises we've considered and
worked on, but it'd be great to get other experts to chime in.  Either y'all
will come back with other ideas we didn't think, or we get to say "we told
you so" - I'm happy either way.

And what I love about the ORP process is that all of this is captured and
vetted in an accessible public forum.  TREC was also peer reviewed, so this
continues that tradition in the newer medium.  And I'll work on an even
clearer outline

Mark

--
Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513


On Thu, Feb 11, 2010 at 11:49 AM, Robert Muir <rcmuir@gmail.com> wrote:

> first of all, thanks for adding this content!
>
> in my opinion one thing that might be helpful would be an 'introduction'
> section that is VERY high-level. I don't want to sound negative but your
> 'high level outline' is actually quite technical :)
>
> it might be a good thing for this project if we had some content somewhere
> that explained at a very very high level what this whole relevance testing
> thing is all about...
>
>
> On Thu, Feb 11, 2010 at 12:58 PM, Mark Bennett <mbennett@ideaeng.com>wrote:
>
>> Good morning Relevancy comrades,
>>
>> I've tried to take a stab at outlining this rather complex subject in the
>> wiki.  Of course it's a work in progress.
>>
>> I've done a high level outline here:
>> http://cwiki.apache.org/confluence/display/ORP/Relevancy+Testing+Outline
>>
>> And an expansion of the first section of the outline here:
>> http://cwiki.apache.org/confluence/display/ORP/Relevancy+Assertion+Testing
>>
>> I actually could use some feedback.  I promise you this is not vanity,
>> there are actually some very pragmatic motives for my postings.
>>
>> I guess some specific questions:
>> * I'm trying to create a bit of a "crash course" in Relevancy Testing, are
>> there major areas I've overlooked?
>> * I've outlined 2 broad categories of testing, do you agree?
>> * I've tried to explore some of the high level strengths and drawbacks of
>> certain methodologies
>> * Is the "tone" reasonably neutral?  What I mean is that some folks may be
>> attached to certain methods, I don't want to seem like I'm "trashing"
>> anything, just trying to point out the strengths and weaknesses in a fair
>> way.
>>
>> I look forward to any comments.
>>
>> Mark
>>
>> --
>> Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
>> Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
>>
>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>

Mime
View raw message