Date: 2004-10-11T11:05:38
Editor: NicolasMaisonneuve <nicoo_@hotmail.com>
Wiki: Jakarta Lucene Wiki
Page: SpellChecker
URL: http://wiki.apache.org/jakarta-lucene/SpellChecker
no comment
Change Log:
------------------------------------------------------------------------------
@@ -1,36 +1,48 @@
=== SpellChecker ===
-a Spell Checker allow to suggest a list of words close to a misspelled word. This implementation
is based on the David Spencer code using the n-gram technic and the levensthein distance.
+A Spell Checker allow to suggest a list of words close to a misspelled word. This implementation
is based on the David Spencer code using the n-gram technic and the levensthein distance.
== Structure of a dictionary index ==
-A Index (the dictionary) with all the possible words (a lucene index) must be created.
The structure of this index is (for a 3-4 gram):
-word:
-gram3:
-gram4:
-3start:
-4start:
-3end:
-4end:
-transposition:
-
-== add words to the dictionary ==
-it's independant of the user index. So we can add words becoming to several
-fields of several index for example or, why not, to a file with a list of words.
+A Index (the dictionary) with all the possible words (a lucene index) must be created.
The structure of this index is (for a 3-4 gram).
+|| Index Structure || Example ||
+|| word || kings ||
+||gram3|| kin, ing, ngs ||
+||gram4|| king, ings||
+||3start|| kin||
+||4start|| king||
+||3end|| ngs||
+||4end|| ings||
-we can add all the keywords of a specific field of your index.
-code:
+== add words to the dictionary ==
+we can add words becoming to several fields of several index for example or, why not, to
a file with a list of words.
+ * Example: we can add all the keywords of a specific field of your index.
+ {{{
SpellChecker spell= new SpellChecker(dictionaryDirectory);
-
-spell.addWords(myIndex_Reader, myField)
-
+spell.addWords(myIndex_Reader, myField);
+ }}}
== get a list of suggest word ==
-The suggestSimilar method return a list of suggests word sorted by the
-Levenshtein distance and optionaly to the popularity of the word for a specific
-field in a user index. More of that, this list can be restricted only to words
-present in a specific field of a user index.
-
+The suggestSimilar method return a list of suggests word sorted by:
+ 1. the Levenshtein distance (the closer word is the first of the list).
+ 2. (optionaly) the popularity of the word for a specific field in a user index.
+
+More of that, this list can be restricted only to words present in a specific field of a
user index.
+
+* First example: the suggestSimilar(misspelled_word, num_list) method.
+ The "num_list" is the maximum number of words returned. In this example (the simplest)
the list is just sorted with the levenshtein distance.
+{{{
+ String[] l=spellChecker.suggestSimilar("sevanty", 10);
+ //l[0] = "seventy" , l[1] = "seven" , l[2]="seventeen"
+}}}
+
+* Second example: the suggestSimilar(misspelled_word, num_list, myIndex_Redear,myField,
morePopular)
+
+1. ""Note"": if myIndex_reader and myField are null this method is the same as the first
method
+2. The returned words are restricted only to the words presents in the field "myfield"
of the user index "myIndex_Reader"
+3. the list is sorted with the second criteria
+4. If "morePopular" is true and the mispelled word exist in the field of the user index
, return only the word more frequent than this.
+
See the test case code for example
== Download ==
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
|