lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Koch" <TheRan...@gmx.net>
Subject Re: Lucene scoring: coord_q_d factor
Date Tue, 12 Dec 2006 16:47:06 GMT
Hello Steven,

I looked up the paper and read the relevant part. The text quote you provided is from the
introcution. I belief that quote referes to the basic purpose of an information retrieval
system in general. At least to the purpose of a vector space model IR system. 

If this is the theoretical justfication of the coord_q_d normalisation than it is actually
replicating the the other part of the scoring formula to some degree. The entire forumla is
actually concerned with this - comparing the term frequencies of query and document.

Is there any other paper that actually shows the benefit of doing this particular normalisation
with coord_q_d? I am not suggesting here that it is not useful, I am just looking for evidence
how the idea developed.

Karl




-------- Original-Nachricht --------
Datum: Tue, 12 Dec 2006 10:01:05 -0500
Von: Steven Rowe <sarowe@syr.edu>
An: java-user@lucene.apache.org
Betreff: Re: Lucene scoring: coord_q_d factor

> Karl Koch wrote:
> > The coord(q,d) normalisation is "a score factor based on how many of
> > the query terms are found in the specified document." and described
> > here:
> > 
> >
> http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_coord
> > 
> > Does this have a theoretical base? On what basis was the decition
> > make to have it? Does anybody know a paper (in Information Retrieval,
> > Information Seeking, etc.) or other more general information about
> > this?
> 
> Following is quoted from: Krovetz, R. & Croft, W. B. (1992) Lexical
> Ambiguity and Information Retrieval. ACM Transactions on Information
> Systems, 10(2): 115-141.
> 
>     Many retrieval systems represent documents and queries
>     by the words they contain, and base the comparison on
>     the number of words they have in common. The more
>     words the query and document have in common, the
>     higher the document is ranked; this is referred to as
>     a "coordination match."  Performance is improved by
>     weighting query and document words using frequency
>     information from the collection and individual
>     document texts [27].
> 
> 27. Salton, G. & McGill, M. Introduction to Modern Information
> Retrieval. McGraw-Hill, New York, 1983.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

-- 
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! 
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message