lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Koch" <TheRan...@gmx.net>
Subject Re: Lucene scoring: coord_q_d factor
Date Wed, 13 Dec 2006 15:00:28 GMT
Hello Steven,

unfortunately I don't have access to these books right now. I will try to get hold of them.
Thank you for these pointers. :)

I had a quick look at "coordination level matching" on the web and found evidence that this
seemed to be an early retrieval strategy. My question is mainly, why one should use coordination
level matching, if one is already doing (proper) TFxIDF based matching. When I look at Lucenes
scoring forumla, it seems to me that two kinds of matching are performed and combined together
in a single matching formula. 

In the paper, "Exploiting the Similarity of Non-matching Terms at Retrieval Time" which can
be found here:

http://www.cis.strath.ac.uk/~fabioc/papers/00-jir.pdf

it is directly compared with TFxIDF. To me, it seems that coordination level matching could
be used if I don't want to use TFxIDF but not together with it. In this context, I wonder
what benefit the "coordination level matching" has in combination with TFxIDF?

It is likely that I have some kind of misunderstanding here. Perhaps with your help I can
untangle that a bit further. As I said earlier, I am only looking for a reasonable explaination
(perhaps augmented with some evidence in literature) that makes it clear why it is used together
with TFxIDF.

Thank you,
Karl



-------- Original-Nachricht --------
Datum: Tue, 12 Dec 2006 17:15:48 -0500
Von: Steven Rowe <sarowe@syr.edu>
An: java-user@lucene.apache.org
Betreff: Re: Lucene scoring: coord_q_d factor

> Karl Koch wrote:
> > Is there any other paper that actually shows the benefit of doing 
> > this particular normalisation with coord_q_d? I am not suggesting
> > here that it is not useful, I am just looking for evidence how the
> > idea developed.
> 
> I think it's a mischaracterization to call coordination a
> "normalization".  In my mind, "normalization" is something applied
> equally to all documents' scores.  The coordination component of a
> document's score varies from document to document, and so doesn't meet
> this criterion.
> 
> I repeat the citation of the book cited by the paper I cited :) :
> 
> >> Salton, G. & McGill, M. Introduction to Modern Information
> >> Retrieval. McGraw-Hill, New York, 1983.
> 
> In addition to the above book, here are two other books that I've seen
> cited as describing "coordination-level matching" (a.k.a. "overlap
> ranking"):
> 
> Salton, G. (1968). Automatic information organization and retrieval.
> New York: McGraw-Hill.
> 
> Lancaster, F.W. (1979). Information retrieval systems: Characteristics,
> testing and evaluation (2nd ed.). New York: Wiley.
> 
> I don't know the answer to your larger question: why use a coordination
> component in a similarity measure when other components (tf*idf) seem to
> serve the same function?  What you seem to be looking for is a study
> that directly compares a system using a coordination component in its
> similarity measure with the *same* system, varying the measure only in
> that coordination is elided.  Unfortunately, I know of no such study.
> 
> Good luck,
> Steve
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

-- 
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! 
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message