lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Rowe <sar...@syr.edu>
Subject Re: Lucene scoring: coord_q_d factor
Date Tue, 12 Dec 2006 15:01:05 GMT
Karl Koch wrote:
> The coord(q,d) normalisation is "a score factor based on how many of
> the query terms are found in the specified document." and described
> here:
> 
> http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_coord
> 
> Does this have a theoretical base? On what basis was the decition
> make to have it? Does anybody know a paper (in Information Retrieval,
> Information Seeking, etc.) or other more general information about
> this?

Following is quoted from: Krovetz, R. & Croft, W. B. (1992) Lexical
Ambiguity and Information Retrieval. ACM Transactions on Information
Systems, 10(2): 115-141.

    Many retrieval systems represent documents and queries
    by the words they contain, and base the comparison on
    the number of words they have in common. The more
    words the query and document have in common, the
    higher the document is ranked; this is referred to as
    a "coordination match."  Performance is improved by
    weighting query and document words using frequency
    information from the collection and individual
    document texts [27].

27. Salton, G. & McGill, M. Introduction to Modern Information
Retrieval. McGraw-Hill, New York, 1983.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message