lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lucene-...@jakarta.apache.org
Subject [Jakarta Lucene Wiki] New: SpellChecker
Date Mon, 11 Oct 2004 17:21:52 GMT
   Date: 2004-10-11T10:21:52
   Editor: NicolasMaisonneuve <nicoo_@hotmail.com>
   Wiki: Jakarta Lucene Wiki
   Page: SpellChecker
   URL: http://wiki.apache.org/jakarta-lucene/SpellChecker

   no comment

New Page:

SpellChecker

a Spell Checker allow to suggest a list of words close to a misspelled word. This implementation
use the n-gram technic and the levensthein distance. 
A  Index (the dictionary) with all the possible words (a lucene index) must be  created. The
structure of this index is (for a 3-4 gram):
word:
gram3:
gram4:
3start:
4start:
3end:
4end:
transposition:
 
it's independant of the user index. So we can add words becoming to several
fields of several index for example or, why not, to a file with a list of words.

source:

SpellChecker spellChecker= new SpellChecker();


The suggestSimilar method return a list of suggests word sorted by the
Levenshtein distance and optionaly to the popularity of the word for a specific
field in a user index. More of that, this list can be restricted only to words
present in a specific field of a user index.
 
See the test case code for example 

download file to [http://issues.apache.org/bugzilla/show_bug.cgi?id=31617]

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message