lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: for a better spellchecker
Date Mon, 09 Jul 2007 00:38:04 GMT

: Now, SpellChecker use the trigram algorithm to find similar words. It
: works well for keyboard fumbles, but not well enough for short words
: and for languages like french where a same sound can be wrote
: differently.
: Spellchecking is a classical computer task, and aspell provides some
: nice and free (it's GNU) sound dictionary. Lots of dictionary are
: available.

The topic of "spell correction" as it pertains to Lucene users can really
have two meanings:
  a) an attempt to suggest potential spell correction of query strings
provided by a user as a form of input pre-processing
  b) to use Lucene as a tool to suggest spell corrections based on a known
corpus.

The contrib/spellchecker code is an application of "B" -- it may in fact
be useful for "A" but that doesn't mean there aren't other non-Lucene
tools for achieving "A" as well.

: I did a python parser which write translation code in different
: languages : python, php and java. A bit like snowball stuff.
: Few works will be done to generate lucene compliant code. But is the
: python generator is well enough to Lucene, or a translation must be
: done in Java to put it in Lucene source?

the Lucene-Java repository tends to be about java code, but
contrib/javascript is an example of code that may be of general use to
Lucene-Java users that isn't java.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message