lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <markharw...@yahoo.co.uk>
Subject More fuzzy issues - encouraging bad spelling?
Date Thu, 23 Dec 2004 13:25:21 GMT
Another thought on fuzzy scoring:
shouldn't all these queries which automatically expand
terms favour common words over rare ones? The default
scoring behaviour at the moment favours rare words. As
a user aren't I more likely to be looking for the most
common expansions? 

If I'm not sure how to spell I might search for:
accomodation~
or
accom*
The fuzzy scoring algorithms will currently favour all
of the mis-spellings of accommodation in the ranking
of results because they are more rare.

Ideally within the expansions of a term the score
contribution should be based on df (as opposed to the
usual idf) BUT within the overall query the usual idf
scheme applies. To clarify:
If I search for:
  the cheapest accomodation~ in london
I want to see the most common spellings of
accommodation before all other variants of this word
BUT I then want these variants scored against the
OTHER words ("in", "the" etc) on the usual basis of
rarity.

This suggests a sort order within another, different
sort order.
This seems like it would not be easy to do. Any bright
ideas?

Cheers
Mark


	
	
		
___________________________________________________________ 
ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message