lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <>
Subject More fuzzy issues - encouraging bad spelling?
Date Thu, 23 Dec 2004 13:25:21 GMT
Another thought on fuzzy scoring:
shouldn't all these queries which automatically expand
terms favour common words over rare ones? The default
scoring behaviour at the moment favours rare words. As
a user aren't I more likely to be looking for the most
common expansions? 

If I'm not sure how to spell I might search for:
The fuzzy scoring algorithms will currently favour all
of the mis-spellings of accommodation in the ranking
of results because they are more rare.

Ideally within the expansions of a term the score
contribution should be based on df (as opposed to the
usual idf) BUT within the overall query the usual idf
scheme applies. To clarify:
If I search for:
  the cheapest accomodation~ in london
I want to see the most common spellings of
accommodation before all other variants of this word
BUT I then want these variants scored against the
OTHER words ("in", "the" etc) on the usual basis of

This suggests a sort order within another, different
sort order.
This seems like it would not be easy to do. Any bright


ALL-NEW Yahoo! Messenger - all new features - even more fun!

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message