lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Rowe <sar...@syr.edu>
Subject Re: Lucene scoring: coord_q_d factor
Date Tue, 12 Dec 2006 22:15:48 GMT
Karl Koch wrote:
> Is there any other paper that actually shows the benefit of doing 
> this particular normalisation with coord_q_d? I am not suggesting
> here that it is not useful, I am just looking for evidence how the
> idea developed.

I think it's a mischaracterization to call coordination a
"normalization".  In my mind, "normalization" is something applied
equally to all documents' scores.  The coordination component of a
document's score varies from document to document, and so doesn't meet
this criterion.

I repeat the citation of the book cited by the paper I cited :) :

>> Salton, G. & McGill, M. Introduction to Modern Information
>> Retrieval. McGraw-Hill, New York, 1983.

In addition to the above book, here are two other books that I've seen
cited as describing "coordination-level matching" (a.k.a. "overlap
ranking"):

Salton, G. (1968). Automatic information organization and retrieval.
New York: McGraw-Hill.

Lancaster, F.W. (1979). Information retrieval systems: Characteristics,
testing and evaluation (2nd ed.). New York: Wiley.

I don't know the answer to your larger question: why use a coordination
component in a similarity measure when other components (tf*idf) seem to
serve the same function?  What you seem to be looking for is a study
that directly compares a system using a coordination component in its
similarity measure with the *same* system, varying the measure only in
that coordination is elided.  Unfortunately, I know of no such study.

Good luck,
Steve


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message