lucene-openrelevance-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: or-user perspective, teams, etc (was: Comments on ORP Wiki Additions ?)
Date Fri, 12 Feb 2010 18:29:28 GMT
hi, some comments inline...

On Fri, Feb 12, 2010 at 12:45 PM, Mark Bennett <mbennett@ideaeng.com> wrote:

> Hi Robert,
>
> Discussing a few more of your terms, by "or-user" you mean folks that
> research search engine relevancy almost fulltime?  And then or-dev as the
> folks who write those tools?  Even counting TREC alums, academics and search
> engine vendor CTO's that seems like a rather specialized, relatively small
> group?
>

i just mean 'consumer of an openrelevance test collection'... preferably
someone with almost no knowledge could use our work!


>
> And this in contrast to orP, folks who are specifically interested in those
> roles in our merry little band?
>
> The "teams" aspect of TREC is interesting one.  I'm mostly comfortable with
> letting that organically evolve into the opt-in model of ASF.  If academic
> teams or vendors want to participate, that's fine, if not, that's OK too (as
> far as I'm concerned).  Perhaps we should simply ask for disclosure when
> somebody wants to contribute.  There was some tension between TREC and
> commercial vendors in later years, as I understand it.  Maybe having an
> additional venue will catch some of their attention.
>

i guess i looked at this more like, we would aim to produce reusable test
collections versus having a "venue"? if someone wants a venue for IR
research they already have TREC,CLEF,FIRE, ... but hey, whatever works!


> I'm probably coming at this from a slightly different angle, as I don't
> research "relevancy" fulltime, it's just one of the aspects of search
> engines tech that companies care about.  And of course I also find it
> personally interesting, though my expectations for it have changed over the
> years.
>
> Some aspects of TREC that we would might want to consider in the ASF model
> include:
> * What do we do when folks make claims outrageous about their engine's
> performance and claiming ORP validation.  I DO think we want to allow folks
> to make some public claims, if they are justified, I'd just like to see some
> guidelines about disclosure and vetting.
>

i mean, who cares? with any test collection out there, you can always
'cheat' and tune your shit to get absurdly high scores... just like you can
blow away micro-benchmarks for system performance, and appear really fast,
but overall your system sucks because you made stupid tradeoffs and
assumptions, its no different.


> * How do we "market" to the vendors and academics.  The opt-in model is not
> about coercion, but you can't opt in to something you don't know about.  :-)
>

why do we need them? I don't understand why they need to "opt-in". do you
mean that if they were participating we would have a more diverse/better
pool?

we don't need their permission: eg why can't I can take a corpus and put it
online somewhere, wait for google to index it, and take the top 1,000 docs
from each query and add them to a pool for judging.

I can do the same with any commercial engine, they can claim whatever they
want, but they certainly don't own our docIDs or search results on our own
docs!!!


>
> I've previously disclosed that I'm a search engine consultant, and our
> participation is ORP is often driven by what clients ask us about.  Some
> examples include:
> * Comparing engine A to engine B, after one or more engines have made
> relevancy claims
>
* Checking a new search engine implementation, as part of a larger user
> acceptance process
> * Seeing the results of various relevancy tweaking techniques within the
> same engine; sometimes you can fix one problematic search and unknowingly
> break 10 others!
> * Initial validation against new content or content in different multiple
> languages
> * Having some type of baseline relevancy test that can be included in unit
> testing / regression testing
>
> I realize these are very different motives than an academic who's devoted
> 30 or 40 years to studying IR.  Compared to that level detail these tests
> could be called a "drive-by".  :-)  As I've previously mentioned, convenient
> interaction with multiple engines is a paramount concern.  I've started the
> process of actually contributing some code that does this.  I assume I'd use
> the JIRA/patch route for this.
>

yes, please!


>
> Mark
>
> --
> Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
> Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
>
>
> On Thu, Feb 11, 2010 at 5:28 PM, Robert Muir <rcmuir@gmail.com> wrote:
>
>> I am sure there are better (much older) papers to describe pooling that
>> might help in addition.
>>
>> that paper is more from an or-user perspective, a guide to getting more
>> realistic use out of a test collection.
>>
>> we should separately look for some good stuff from an or-dev perspective
>> if we want to properly create pooled judgements for a test collection, its a
>> little strange, who would be in the pool? (its not like a trec conference
>> where you have a finite set of teams and they are all getting pooled
>> equally)
>>
>>
>> On Thu, Feb 11, 2010 at 7:42 PM, Mark Bennett <mbennett@ideaeng.com>wrote:
>>
>>> Robert,
>>>
>>> That link was awesome, thank you!  I've added it to the detailed page.
>>>
>>> Also, I've taken a stab at an Introduction on the outline page.  Oddly,
>>> Confluence seems to need manual refreshing more than other wikis I use, even
>>> days later.  I wonder if there's a cache setting or something...
>>>
>>> With regards to outline, in some places it's perhaps more terse than
>>> technical.  In other projects I've found that even an incomplete outline can
>>> evolve into a great resource.
>>>
>>> With regards to code, I've been working on some stuff that interacts with
>>> multiple search engines in their native format and translates into a common
>>> Atom feed, along the lines of the OpenSearch format.  This is in the "you
>>> want it, you build it".  Our interest in ORP is very cross-engine centric.
>>>
>>> Still lots of details to work through.  If anybody knows XSLT *really*
>>> well I'd like to bend their ear, having some issues with namespaces.
>>>
>>>
>>> --
>>> Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
>>> Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
>>>
>>>
>>> On Thu, Feb 11, 2010 at 1:42 PM, Robert Muir <rcmuir@gmail.com> wrote:
>>>
>>>> only a partial subset of the docs (some top-N from different
>>>> submissions) are placed into a pool and judged.
>>>>
>>>> here is a great little presentation that is very relevant to ORP
>>>> project, as i am sure we don't want to create complete judgements, yet we
>>>> want reusable evaluation collections:
>>>> http://www.ir.uwaterloo.ca/slides/buettcher_reliable_evaluation.pdf
>>>>
>>>>
>>>> On Thu, Feb 11, 2010 at 4:31 PM, Mark Bennett <mbennett@ideaeng.com>wrote:
>>>>
>>>>> Hi Robert,
>>>>>
>>>>> By "pooling", you mean they combine different sets of source docs and
>>>>> question sets, in kind of a patch work?  If that's what you mean, do
you
>>>>> know how that process was generally done?  How close to "perfection",
ie
>>>>> total coverage by humans, do you think they got?
>>>>>
>>>>> If that's not what you meant by "pooling" then I'm a bit confused...
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Mark
>>>>>
>>>>> --
>>>>> Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
>>>>> Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
>>>>>
>>>>>
>>>>> On Thu, Feb 11, 2010 at 1:02 PM, Robert Muir <rcmuir@gmail.com>
wrote:
>>>>>
>>>>>> in this case pooling is what is typically used.
>>>>>>
>>>>>>
>>>>>> On Thu, Feb 11, 2010 at 3:49 PM, Mark Bennett <mbennett@ideaeng.com>wrote:
>>>>>>
>>>>>>> Thanks Robert,
>>>>>>>
>>>>>>> Excellent comments, I'll try to add something to the outline.
 Either
>>>>>>> a higher level top section, or some intro text.
>>>>>>>
>>>>>>> Robert, in particular, I wonder if you could look at:
>>>>>>>
>>>>>>>
>>>>>>> http://cwiki.apache.org/confluence/display/ORP/Relevancy+Assertion+Testing
>>>>>>>
>>>>>>> In the section on "Full-Grid Assertions (TREC-Style!)"
>>>>>>>
>>>>>>> It talks about the "M x N" problem of creating relevancy judgment
>>>>>>> data.  It also explores some of the shortcuts that could be used.
>>>>>>>
>>>>>>> We're actually working through these problems with a couple clients.
>>>>>>> On the one hand they want "perfect" measurements, but on the
other hand
>>>>>>> nobody wants to fund the work to create completely curated test
sets.  This
>>>>>>> is the classic "good vs. cheap" argument, and I DO think there
are
>>>>>>> reasonable compromises to be had.
>>>>>>>
>>>>>>> TREC has evolved over the years and I wonder how they've addressed
>>>>>>> these.  Did they take any shortcuts?  Or did they get enough
manpower to
>>>>>>> really curate every single document and relevancy judgment?
>>>>>>>
>>>>>>> I'll be adding more about some of the compromises we've considered
>>>>>>> and worked on, but it'd be great to get other experts to chime
in.  Either
>>>>>>> y'all will come back with other ideas we didn't think, or we
get to say "we
>>>>>>> told you so" - I'm happy either way.
>>>>>>>
>>>>>>> And what I love about the ORP process is that all of this is
captured
>>>>>>> and vetted in an accessible public forum.  TREC was also peer
reviewed, so
>>>>>>> this continues that tradition in the newer medium.  And I'll
work on an even
>>>>>>> clearer outline
>>>>>>>
>>>>>>>
>>>>>>> Mark
>>>>>>>
>>>>>>> --
>>>>>>> Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
>>>>>>> Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Feb 11, 2010 at 11:49 AM, Robert Muir <rcmuir@gmail.com>wrote:
>>>>>>>
>>>>>>>> first of all, thanks for adding this content!
>>>>>>>>
>>>>>>>> in my opinion one thing that might be helpful would be an
>>>>>>>> 'introduction' section that is VERY high-level. I don't want
to sound
>>>>>>>> negative but your 'high level outline' is actually quite
technical :)
>>>>>>>>
>>>>>>>> it might be a good thing for this project if we had some
content
>>>>>>>> somewhere that explained at a very very high level what this
whole relevance
>>>>>>>> testing thing is all about...
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Feb 11, 2010 at 12:58 PM, Mark Bennett <
>>>>>>>> mbennett@ideaeng.com> wrote:
>>>>>>>>
>>>>>>>>> Good morning Relevancy comrades,
>>>>>>>>>
>>>>>>>>> I've tried to take a stab at outlining this rather complex
subject
>>>>>>>>> in the wiki.  Of course it's a work in progress.
>>>>>>>>>
>>>>>>>>> I've done a high level outline here:
>>>>>>>>>
>>>>>>>>> http://cwiki.apache.org/confluence/display/ORP/Relevancy+Testing+Outline
>>>>>>>>>
>>>>>>>>> And an expansion of the first section of the outline
here:
>>>>>>>>>
>>>>>>>>> http://cwiki.apache.org/confluence/display/ORP/Relevancy+Assertion+Testing
>>>>>>>>>
>>>>>>>>> I actually could use some feedback.  I promise you this
is not
>>>>>>>>> vanity, there are actually some very pragmatic motives
for my postings.
>>>>>>>>>
>>>>>>>>> I guess some specific questions:
>>>>>>>>> * I'm trying to create a bit of a "crash course" in Relevancy
>>>>>>>>> Testing, are there major areas I've overlooked?
>>>>>>>>> * I've outlined 2 broad categories of testing, do you
agree?
>>>>>>>>> * I've tried to explore some of the high level strengths
and
>>>>>>>>> drawbacks of certain methodologies
>>>>>>>>> * Is the "tone" reasonably neutral?  What I mean is that
some folks
>>>>>>>>> may be attached to certain methods, I don't want to seem
like I'm "trashing"
>>>>>>>>> anything, just trying to point out the strengths and
weaknesses in a fair
>>>>>>>>> way.
>>>>>>>>>
>>>>>>>>> I look forward to any comments.
>>>>>>>>>
>>>>>>>>> Mark
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
>>>>>>>>> Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Robert Muir
>>>>>>>> rcmuir@gmail.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Robert Muir
>>>>>> rcmuir@gmail.com
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Robert Muir
>>>> rcmuir@gmail.com
>>>>
>>>
>>>
>>
>>
>> --
>> Robert Muir
>> rcmuir@gmail.com
>>
>
>


-- 
Robert Muir
rcmuir@gmail.com

Mime
View raw message