lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Request to change "coord" similarity API:
Date Fri, 24 Aug 2007 06:26:47 GMT

: I'm hoping that coord similarity API can be changed from:
: float coord(int overlap, int maxOverlap)
: float coord(int overlap, int maxOverlap, int docSize)

that's a pretty significant change ... especally considering Lucene
doesn't know the docSize.  you my want to review the comments in another
recent related thread that suggested incorperating the average doc

: score.  Nothing can help here, changing lengthNorm to intentionally lower
: the score of car names as they get longer doesn't make sense, the "Volvo V70
: Wagon Luxury Edition Sports Pacakge AWD" is just as much of a car as the

the long name may be "just as much of a car" as the short name, but the
lengthNorm by itself isn't really important -- it's all relative, the
lengthNorm is just there to help offset other factors such as higher tfs
and in the case of larger boolean queries: a higher coord factor.

Regarding your specific problem: other people have solved this using
PhraseQueries with extermely large slop, and sentinal terms indexed at
the start and end of their field values.  ie...

Doc1:  _START_ Volvo V70 Wagon _END_
Doc2:  _START_ Volvo V70 Wagon Luxury Edition Sports Pacakge AWD _END_

User Input: Volvo V70 Wagon
Query:     SpanNearQuery(_START_, Volvo, V70, Wagon, _END_, 10000)

...both docs will match, Doc1 will match with a much higher score.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message