lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mathieu Lecarme <>
Subject Re: for a better spellchecker
Date Fri, 13 Jul 2007 20:19:12 GMT
The SpellChecker code mix indexing function, ngram treatment, and  
querying functions. Extending it will not produce clean code.
Is it relevant to first refactor SpellChecker code for extracting   
dictionary reading function and indexing/searching functions?
SpellChecker will get a method to add SpellEngine interface wich  
looks like

interface SpellEngine {
	public void addWord(String word);
	public String[] suggestSimilar(String word, int numSug);

and something to sort suggestions, like "distance" from suggested word.


Le 9 juil. 07 à 02:38, Chris Hostetter a écrit :

> : Now, SpellChecker use the trigram algorithm to find similar  
> words. It
> : works well for keyboard fumbles, but not well enough for short words
> : and for languages like french where a same sound can be wrote
> : differently.
> : Spellchecking is a classical computer task, and aspell provides some
> : nice and free (it's GNU) sound dictionary. Lots of dictionary are
> : available.
> The topic of "spell correction" as it pertains to Lucene users can  
> really
> have two meanings:
>   a) an attempt to suggest potential spell correction of query strings
> provided by a user as a form of input pre-processing
>   b) to use Lucene as a tool to suggest spell corrections based on  
> a known
> corpus.
> The contrib/spellchecker code is an application of "B" -- it may in  
> fact
> be useful for "A" but that doesn't mean there aren't other non-Lucene
> tools for achieving "A" as well.
> : I did a python parser which write translation code in different
> : languages : python, php and java. A bit like snowball stuff.
> : Few works will be done to generate lucene compliant code. But is the
> : python generator is well enough to Lucene, or a translation must be
> : done in Java to put it in Lucene source?
> the Lucene-Java repository tends to be about java code, but
> contrib/javascript is an example of code that may be of general use to
> Lucene-Java users that isn't java.
> -Hoss
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message