lucene-openrelevance-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Bennett <mbenn...@ideaeng.com>
Subject Re: or-user perspective, teams, etc (was: Comments on ORP Wiki Additions ?)
Date Fri, 12 Feb 2010 17:45:19 GMT
Hi Robert,

Discussing a few more of your terms, by "or-user" you mean folks that
research search engine relevancy almost fulltime?  And then or-dev as the
folks who write those tools?  Even counting TREC alums, academics and search
engine vendor CTO's that seems like a rather specialized, relatively small
group?

And this in contrast to orP, folks who are specifically interested in those
roles in our merry little band?

The "teams" aspect of TREC is interesting one.  I'm mostly comfortable with
letting that organically evolve into the opt-in model of ASF.  If academic
teams or vendors want to participate, that's fine, if not, that's OK too (as
far as I'm concerned).  Perhaps we should simply ask for disclosure when
somebody wants to contribute.  There was some tension between TREC and
commercial vendors in later years, as I understand it.  Maybe having an
additional venue will catch some of their attention.

I'm probably coming at this from a slightly different angle, as I don't
research "relevancy" fulltime, it's just one of the aspects of search
engines tech that companies care about.  And of course I also find it
personally interesting, though my expectations for it have changed over the
years.

Some aspects of TREC that we would might want to consider in the ASF model
include:
* What do we do when folks make claims outrageous about their engine's
performance and claiming ORP validation.  I DO think we want to allow folks
to make some public claims, if they are justified, I'd just like to see some
guidelines about disclosure and vetting.
* How do we "market" to the vendors and academics.  The opt-in model is not
about coercion, but you can't opt in to something you don't know about.  :-)

I've previously disclosed that I'm a search engine consultant, and our
participation is ORP is often driven by what clients ask us about.  Some
examples include:
* Comparing engine A to engine B, after one or more engines have made
relevancy claims
* Checking a new search engine implementation, as part of a larger user
acceptance process
* Seeing the results of various relevancy tweaking techniques within the
same engine; sometimes you can fix one problematic search and unknowingly
break 10 others!
* Initial validation against new content or content in different multiple
languages
* Having some type of baseline relevancy test that can be included in unit
testing / regression testing

I realize these are very different motives than an academic who's devoted 30
or 40 years to studying IR.  Compared to that level detail these tests could
be called a "drive-by".  :-)  As I've previously mentioned, convenient
interaction with multiple engines is a paramount concern.  I've started the
process of actually contributing some code that does this.  I assume I'd use
the JIRA/patch route for this.

Mark

--
Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513


On Thu, Feb 11, 2010 at 5:28 PM, Robert Muir <rcmuir@gmail.com> wrote:

> I am sure there are better (much older) papers to describe pooling that
> might help in addition.
>
> that paper is more from an or-user perspective, a guide to getting more
> realistic use out of a test collection.
>
> we should separately look for some good stuff from an or-dev perspective if
> we want to properly create pooled judgements for a test collection, its a
> little strange, who would be in the pool? (its not like a trec conference
> where you have a finite set of teams and they are all getting pooled
> equally)
>
>
> On Thu, Feb 11, 2010 at 7:42 PM, Mark Bennett <mbennett@ideaeng.com>wrote:
>
>> Robert,
>>
>> That link was awesome, thank you!  I've added it to the detailed page.
>>
>> Also, I've taken a stab at an Introduction on the outline page.  Oddly,
>> Confluence seems to need manual refreshing more than other wikis I use, even
>> days later.  I wonder if there's a cache setting or something...
>>
>> With regards to outline, in some places it's perhaps more terse than
>> technical.  In other projects I've found that even an incomplete outline can
>> evolve into a great resource.
>>
>> With regards to code, I've been working on some stuff that interacts with
>> multiple search engines in their native format and translates into a common
>> Atom feed, along the lines of the OpenSearch format.  This is in the "you
>> want it, you build it".  Our interest in ORP is very cross-engine centric.
>>
>> Still lots of details to work through.  If anybody knows XSLT *really*
>> well I'd like to bend their ear, having some issues with namespaces.
>>
>>
>> --
>> Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
>> Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
>>
>>
>> On Thu, Feb 11, 2010 at 1:42 PM, Robert Muir <rcmuir@gmail.com> wrote:
>>
>>> only a partial subset of the docs (some top-N from different submissions)
>>> are placed into a pool and judged.
>>>
>>> here is a great little presentation that is very relevant to ORP project,
>>> as i am sure we don't want to create complete judgements, yet we want
>>> reusable evaluation collections:
>>> http://www.ir.uwaterloo.ca/slides/buettcher_reliable_evaluation.pdf
>>>
>>>
>>> On Thu, Feb 11, 2010 at 4:31 PM, Mark Bennett <mbennett@ideaeng.com>wrote:
>>>
>>>> Hi Robert,
>>>>
>>>> By "pooling", you mean they combine different sets of source docs and
>>>> question sets, in kind of a patch work?  If that's what you mean, do you
>>>> know how that process was generally done?  How close to "perfection", ie
>>>> total coverage by humans, do you think they got?
>>>>
>>>> If that's not what you meant by "pooling" then I'm a bit confused...
>>>>
>>>> Thanks,
>>>>
>>>> Mark
>>>>
>>>> --
>>>> Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
>>>> Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
>>>>
>>>>
>>>> On Thu, Feb 11, 2010 at 1:02 PM, Robert Muir <rcmuir@gmail.com> wrote:
>>>>
>>>>> in this case pooling is what is typically used.
>>>>>
>>>>>
>>>>> On Thu, Feb 11, 2010 at 3:49 PM, Mark Bennett <mbennett@ideaeng.com>wrote:
>>>>>
>>>>>> Thanks Robert,
>>>>>>
>>>>>> Excellent comments, I'll try to add something to the outline.  Either
>>>>>> a higher level top section, or some intro text.
>>>>>>
>>>>>> Robert, in particular, I wonder if you could look at:
>>>>>>
>>>>>>
>>>>>> http://cwiki.apache.org/confluence/display/ORP/Relevancy+Assertion+Testing
>>>>>>
>>>>>> In the section on "Full-Grid Assertions (TREC-Style!)"
>>>>>>
>>>>>> It talks about the "M x N" problem of creating relevancy judgment
>>>>>> data.  It also explores some of the shortcuts that could be used.
>>>>>>
>>>>>> We're actually working through these problems with a couple clients.
>>>>>> On the one hand they want "perfect" measurements, but on the other
hand
>>>>>> nobody wants to fund the work to create completely curated test sets.
 This
>>>>>> is the classic "good vs. cheap" argument, and I DO think there are
>>>>>> reasonable compromises to be had.
>>>>>>
>>>>>> TREC has evolved over the years and I wonder how they've addressed
>>>>>> these.  Did they take any shortcuts?  Or did they get enough manpower
to
>>>>>> really curate every single document and relevancy judgment?
>>>>>>
>>>>>> I'll be adding more about some of the compromises we've considered
and
>>>>>> worked on, but it'd be great to get other experts to chime in.  Either
y'all
>>>>>> will come back with other ideas we didn't think, or we get to say
"we told
>>>>>> you so" - I'm happy either way.
>>>>>>
>>>>>> And what I love about the ORP process is that all of this is captured
>>>>>> and vetted in an accessible public forum.  TREC was also peer reviewed,
so
>>>>>> this continues that tradition in the newer medium.  And I'll work
on an even
>>>>>> clearer outline
>>>>>>
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>> --
>>>>>> Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
>>>>>> Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
>>>>>>
>>>>>>
>>>>>> On Thu, Feb 11, 2010 at 11:49 AM, Robert Muir <rcmuir@gmail.com>wrote:
>>>>>>
>>>>>>> first of all, thanks for adding this content!
>>>>>>>
>>>>>>> in my opinion one thing that might be helpful would be an
>>>>>>> 'introduction' section that is VERY high-level. I don't want
to sound
>>>>>>> negative but your 'high level outline' is actually quite technical
:)
>>>>>>>
>>>>>>> it might be a good thing for this project if we had some content
>>>>>>> somewhere that explained at a very very high level what this
whole relevance
>>>>>>> testing thing is all about...
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Feb 11, 2010 at 12:58 PM, Mark Bennett <mbennett@ideaeng.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Good morning Relevancy comrades,
>>>>>>>>
>>>>>>>> I've tried to take a stab at outlining this rather complex
subject
>>>>>>>> in the wiki.  Of course it's a work in progress.
>>>>>>>>
>>>>>>>> I've done a high level outline here:
>>>>>>>>
>>>>>>>> http://cwiki.apache.org/confluence/display/ORP/Relevancy+Testing+Outline
>>>>>>>>
>>>>>>>> And an expansion of the first section of the outline here:
>>>>>>>>
>>>>>>>> http://cwiki.apache.org/confluence/display/ORP/Relevancy+Assertion+Testing
>>>>>>>>
>>>>>>>> I actually could use some feedback.  I promise you this is
not
>>>>>>>> vanity, there are actually some very pragmatic motives for
my postings.
>>>>>>>>
>>>>>>>> I guess some specific questions:
>>>>>>>> * I'm trying to create a bit of a "crash course" in Relevancy
>>>>>>>> Testing, are there major areas I've overlooked?
>>>>>>>> * I've outlined 2 broad categories of testing, do you agree?
>>>>>>>> * I've tried to explore some of the high level strengths
and
>>>>>>>> drawbacks of certain methodologies
>>>>>>>> * Is the "tone" reasonably neutral?  What I mean is that
some folks
>>>>>>>> may be attached to certain methods, I don't want to seem
like I'm "trashing"
>>>>>>>> anything, just trying to point out the strengths and weaknesses
in a fair
>>>>>>>> way.
>>>>>>>>
>>>>>>>> I look forward to any comments.
>>>>>>>>
>>>>>>>> Mark
>>>>>>>>
>>>>>>>> --
>>>>>>>> Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
>>>>>>>> Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Robert Muir
>>>>>>> rcmuir@gmail.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Robert Muir
>>>>> rcmuir@gmail.com
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Robert Muir
>>> rcmuir@gmail.com
>>>
>>
>>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>

Mime
View raw message