lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Request to change "coord" similarity API:
Date Fri, 24 Aug 2007 06:26:47 GMT

: I'm hoping that coord similarity API can be changed from:
: float coord(int overlap, int maxOverlap)
	...
: float coord(int overlap, int maxOverlap, int docSize)

that's a pretty significant change ... especally considering Lucene
doesn't know the docSize.  you my want to review the comments in another
recent related thread that suggested incorperating the average doc
Length...

http://www.nabble.com/search-quality---assessment---improvements-tf3974580.html#a11701392

: score.  Nothing can help here, changing lengthNorm to intentionally lower
: the score of car names as they get longer doesn't make sense, the "Volvo V70
: Wagon Luxury Edition Sports Pacakge AWD" is just as much of a car as the

the long name may be "just as much of a car" as the short name, but the
lengthNorm by itself isn't really important -- it's all relative, the
lengthNorm is just there to help offset other factors such as higher tfs
and in the case of larger boolean queries: a higher coord factor.


Regarding your specific problem: other people have solved this using
PhraseQueries with extermely large slop, and sentinal terms indexed at
the start and end of their field values.  ie...

Doc1:  _START_ Volvo V70 Wagon _END_
Doc2:  _START_ Volvo V70 Wagon Luxury Edition Sports Pacakge AWD _END_

User Input: Volvo V70 Wagon
Query:     SpanNearQuery(_START_, Volvo, V70, Wagon, _END_, 10000)

...both docs will match, Doc1 will match with a much higher score.




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message