lucene-openrelevance-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Bennett <mbenn...@ideaeng.com>
Subject Re. Lots to talk about, ORP
Date Wed, 23 Dec 2009 22:49:39 GMT
Thanks for the replies Robert and Grant,

So I think there's general agreement on a few things:

1: Ideally TREC and ORP should be able to inter-operate, and everybody's
welcome!  If some of those TREC folks are on this list, maybe they could
chime in about the qrels data formats?  And use the best of both in a
complementary way when appropriate.  I'll have to go look up the CLEF
reference.

2: This should also inter-operate with multiple search engines, although
some of the early contributors are focused on Lucene derivatives.  Our
company tends to be very cross-engine focused.  Though I'm a fan of
Lucene/Solr/Nutch, we would seem to get more mileage if we could talk about
other engines early on, even if it's only in terms of documentation.

Are you guys on board with this?  There were comments like "First and
foremost, this project is a way for Lucene to talk about relevance in a
standard way..." and "I think for starters, our primary focus should be to
support improvements of apache lucene-related projects. Then we can expand
later... "

If we push that too hard, we'll scare away folks from other communities.  I
agree that people should each scratch their worst itch, I think it's in part
a question of positioning.  Solr and Nutch are very heavily associated with
Lucene, which is understandable.  But virtually every client we work with
has multiple engines, so we have a bit of a different itch I guess.

3: Multiple languages are good, even though some of the early content has
been selected more because it was available.  English might be a strategic
language to get covered early.  I'd really like to see a parallel set of
test documents and searches in multiple languages; that's what my client is
having to build.

4: We're open to data formats based on TREC qrels files or XML or anything
else we can get our hands on.  :-)  And Lucene folks would like to get them
into the Lucene benchmark format as well.  And folks are flexible on binary
vs. non-binary grading.  And it seems like all 3 of us are content with XML
as one possible format, and not too worried about the overhead.  Given all
the XML schemas out there, seems like somebody might have already drafted
one...

5: Interoperation with Excel might be nice, for both data entry for people
doing large grading, and for generating reports, though those 2 activities
are not related.

6:  We'll use version control the test doc set, to have reproducible results
and avoid content drift between tests.  Is there an SVN tree yet?  I hadn't
seen one?

7: More judges, either via a web UI or perhaps some open spreadsheet
template, will help smooth out individual differences in grades.

8: Need good tools to manage content and tests in an orderly fashion.

9: Yes, I think ASL style is compatible with most institutions and
businesses.

Some other comments / replies to you both:

>> Examples:
>> * "Doc1 is more relevant to Search1 than Doc2"
>> * "I'd like to see at least 3 of these docs in the top 10 matches for
this search"
> I am not really sure how these would work for large-scale search, I worry
that it wouldnt be relevance testing but very specific tuning. In practice
this is the kind of thing where you would just manually fudge these things
anyway... or am I reading you wrong?

This is a long topic for discussion and I likely didn't explain it very
well.  I'll ponder it a bit more.

I would say that, yes, there some relation to ORP and tuning.  As I tune my
search engine, to fix one query, it'd be nice to know that I haven't broken
100 others.  I've actually seen this happen, it's like a weird game of
"whack a mole"

> I've been thinking about  this some lately too, at least we could start
with apache lucene-related mail archives or something. I know this would be
a wierd collection with very specific bias (all the code and everything) but
it would still allow us to start creating some kind of framework to
crowdsource judgements...

Do you have any experience with the data formats the lists are archived in?
I haven't looked.

> I think what we are doing is different. its not about taking anyone on,
its about having something open available, even if right now its not yet as
good, its better than nothing.

I generally agree.  Was there something in particular that seemed
otherwise?  Or just a general thought?

> Many clustering algorithms calculate distance much the same way that the
engine scores, so it may just be a case of self-fulfilling prophesy.

Ah, again, I didn't explain very well.

For manual grading, one approach is to have a simple grid of documents and
searches, and then ask folks to fill in all the squares.

A slightly different approach is to manually group documents into logical
groups.  In fact, you can have users do this in 2 or 3 iterations.  Having
these groups can make it a bit easier to grade specific questions against
sets of documents, although it might introduce some bias as well.

It's just a different approach to having humans manually grade a mountain of
documents against test searches.



--
Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513

Mime
View raw message