lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eks dev <eks...@yahoo.co.uk>
Subject Re: combining open office spellchecker with Lucene
Date Fri, 10 Sep 2004 08:04:03 GMT
Hi Doug,

> Perhaps.  Are folks really better at spelling the
> beginning of words?

Yes they are. There were some comprehensive empirical
studies on this topic. Winkler modification on Jaro
string distance is based on this assumption (boosting
similarity if first n, I think 4, chars match).
Jaro-Winkler is well documented and some folks thinks
that it is much more efficient and precise than plain
Edit distance (of course for normal language, not
numbers or so).
I will try to dig-out some references from my disk on
this topic, if you are interested.

On another note,
I would even suggest using Jaro-Winkler distance as
default for fuzzy query. (one could configure max
prefix required => prefix query to reduce number of
distance calculations). This could speed-up fuzzy
search dramatically.

Hope this was helpful,
Eks




  



	
	
		
___________________________________________________________ALL-NEW Yahoo! Messenger - all
new features - even more fun!  http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message