lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <dave-lucene-u...@tropo.com>
Subject Re: combining open office spellchecker with Lucene
Date Fri, 10 Sep 2004 15:05:46 GMT
eks dev wrote:

> Hi Doug,
> 
> 
>>Perhaps.  Are folks really better at spelling the
>>beginning of words?
> 
> 
> Yes they are. There were some comprehensive empirical
> studies on this topic. Winkler modification on Jaro
> string distance is based on this assumption (boosting
> similarity if first n, I think 4, chars match).
> Jaro-Winkler is well documented and some folks thinks
> that it is much more efficient and precise than plain
> Edit distance (of course for normal language, not
> numbers or so).
> I will try to dig-out some references from my disk on

Good ole Citeseer finds 2 docs that seem relevant:

http://citeseer.ist.psu.edu/cs?cs=1&q=Winkler+Jaro&submit=Documents&co=Citations&cm=50&cf=Any&ao=Citations&am=20&af=Any

I have some of the ngram spelling suggestion stuff, based on earlier 
msgs in this thread, working in my dev tree. I'll try to get a test site 
up later today for people to fool around with.


> this topic, if you are interested.
> 
> On another note,
> I would even suggest using Jaro-Winkler distance as
> default for fuzzy query. (one could configure max
> prefix required => prefix query to reduce number of
> distance calculations). This could speed-up fuzzy
> search dramatically.
> 
> Hope this was helpful,
> Eks
> 
> 
> 
> 
>   
> 
> 
> 
> 	
> 	
> 		
> ___________________________________________________________ALL-NEW Yahoo! Messenger -
all new features - even more fun!  http://uk.messenger.yahoo.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message