lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mathieu Lecarme <math...@garambrogne.net>
Subject for a better spellchecker
Date Fri, 06 Jul 2007 18:37:45 GMT
Now, SpellChecker use the trigram algorithm to find similar words. It  
works well for keyboard fumbles, but not well enough for short words  
and for languages like french where a same sound can be wrote  
differently.
Spellchecking is a classical computer task, and aspell provides some  
nice and free (it's GNU) sound dictionary. Lots of dictionary are  
available.
I did a python parser which write translation code in different  
languages : python, php and java. A bit like snowball stuff.
Few works will be done to generate lucene compliant code. But is the  
python generator is well enough to Lucene, or a translation must be  
done in Java to put it in Lucene source?

I'll start soon a PhonemeSpellChecker wich overide the trigram  
SpellChecker.

Next step is to implement word cutter, just like Google suggest.

Any suggestions?

M.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message