lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "J. Delgado" <jdelg...@lendingclub.com>
Subject Re: for a better spellchecker
Date Fri, 06 Jul 2007 20:42:53 GMT
Instead of "overriding" the trigram approach you may want to do a
combination. That is create trigrams out of the list of words from the
dictionary and weigh the matches much higher than those coming from the
index or even have a first dictionary exact lookup and then a trigram/index
based lookup if it fails.

J.D.

2007/7/6, Mathieu Lecarme <mathieu@garambrogne.net>:
>
> Now, SpellChecker use the trigram algorithm to find similar words. It
> works well for keyboard fumbles, but not well enough for short words
> and for languages like french where a same sound can be wrote
> differently.
> Spellchecking is a classical computer task, and aspell provides some
> nice and free (it's GNU) sound dictionary. Lots of dictionary are
> available.
> I did a python parser which write translation code in different
> languages : python, php and java. A bit like snowball stuff.
> Few works will be done to generate lucene compliant code. But is the
> python generator is well enough to Lucene, or a translation must be
> done in Java to put it in Lucene source?
>
> I'll start soon a PhonemeSpellChecker wich overide the trigram
> SpellChecker.
>
> Next step is to implement word cutter, just like Google suggest.
>
> Any suggestions?
>
> M.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message