lucene-openrelevance-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: or-user perspective, teams, etc (was: Comments on ORP Wiki Additions ?)
Date Sat, 13 Feb 2010 16:36:10 GMT
Inline

On Feb 12, 2010, at 12:45 PM, Mark Bennett wrote:

> Hi Robert,
> 
> Discussing a few more of your terms, by "or-user" you mean folks that research search
engine relevancy almost fulltime?  And then or-dev as the folks who write those tools?  Even
counting TREC alums, academics and search engine vendor CTO's that seems like a rather specialized,
relatively small group?
> 
> And this in contrast to orP, folks who are specifically interested in those roles in
our merry little band?
> 
> The "teams" aspect of TREC is interesting one.  I'm mostly comfortable with letting that
organically evolve into the opt-in model of ASF.  If academic teams or vendors want to participate,
that's fine, if not, that's OK too (as far as I'm concerned).  Perhaps we should simply ask
for disclosure when somebody wants to contribute.  There was some tension between TREC and
commercial vendors in later years, as I understand it.  Maybe having an additional venue will
catch some of their attention.

The ASF is about individuals.  Companies may fund individuals to work on the ASF, but at the
end of the day sponsorship gets you a link on the sponsors page and shows that your values
are in line w/ the ASF and that it is something a company wishes to show support for, but
nothing else.  Thus, we don't need disclosure for anybody wanting to contribute.  Well, check
that.  We need disclosure that they are in fact the individual they say they are, but even
this is taken on face value.  The ASF strives to be vendor neutral.   We should evaluate the
contributions solely on their merit.  It is the committers job to make such an evaluation
on behalf of the ASF and then it is the PMC's job to make sure said contributions, when part
of an official release, are correct and legal to the best of our knowledge.

> 
> I'm probably coming at this from a slightly different angle, as I don't research "relevancy"
fulltime, it's just one of the aspects of search engines tech that companies care about. 
And of course I also find it personally interesting, though my expectations for it have changed
over the years.
> 
> Some aspects of TREC that we would might want to consider in the ASF model include:
> * What do we do when folks make claims outrageous about their engine's performance and
claiming ORP validation.  I DO think we want to allow folks to make some public claims, if
they are justified, I'd just like to see some guidelines about disclosure and vetting.

We don't need to do anything.  It's not our job to police.  Our job is to produce tools for
people to perform relevance assessments under the Apache license.  The marketplace will take
care of exposing the fallacy of any ill-gotten results. 

Again, however, remember that while ORP may be useful in a broader sense, it is not a requirement.
 ORP is setup to give people a way to talk about relevance in a completely open way.  If someone
chooses exaggerate claims or abuse it, that's their problem for being an idiot, not ours.

> * How do we "market" to the vendors and academics.  The opt-in model is not about coercion,
but you can't opt in to something you don't know about.  :-)

We don't need to market ourselves at all.  If people find it of value, then they will come
and pitch in.  If it's not of value, they won't.  Our focus, again, should be on producing
competent code and good content which can be used to rigorously study relevance.

> 
> I've previously disclosed that I'm a search engine consultant, and our participation
is ORP is often driven by what clients ask us about.  Some examples include:
> * Comparing engine A to engine B, after one or more engines have made relevancy claims
> * Checking a new search engine implementation, as part of a larger user acceptance process
> * Seeing the results of various relevancy tweaking techniques within the same engine;
sometimes you can fix one problematic search and unknowingly break 10 others!
> * Initial validation against new content or content in different multiple languages
> * Having some type of baseline relevancy test that can be included in unit testing /
regression testing

All of these are great reasons for you to participate and many align with the goals of the
project.  Above all else, O/S is about scratching your own itch.

> 
> I realize these are very different motives than an academic who's devoted 30 or 40 years
to studying IR.  Compared to that level detail these tests could be called a "drive-by". 
:-)  As I've previously mentioned, convenient interaction with multiple engines is a paramount
concern.  

Patches welcome, but we'll likely have to be careful about the rights on that code if it requires
libraries from proprietary vendors.  

> I've started the process of actually contributing some code that does this.  I assume
I'd use the JIRA/patch route for this.

+1
Mime
View raw message