lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lucene-...@jakarta.apache.org
Subject [Jakarta Lucene Wiki] Updated: SpellChecker
Date Mon, 11 Oct 2004 18:05:38 GMT
   Date: 2004-10-11T11:05:38
   Editor: NicolasMaisonneuve <nicoo_@hotmail.com>
   Wiki: Jakarta Lucene Wiki
   Page: SpellChecker
   URL: http://wiki.apache.org/jakarta-lucene/SpellChecker

   no comment

Change Log:

------------------------------------------------------------------------------
@@ -1,36 +1,48 @@
 === SpellChecker ===
 
-a Spell Checker allow to suggest a list of words close to a misspelled word. This implementation
is based on the David Spencer code using the n-gram technic and the levensthein distance.

+A Spell Checker allow to suggest a list of words close to a misspelled word. This implementation
is based on the David Spencer code using the n-gram technic and the levensthein distance.

 
 == Structure of a dictionary index ==
-A  Index (the dictionary) with all the possible words (a lucene index) must be  created.
The structure of this index is (for a 3-4 gram):
-word:
-gram3:
-gram4:
-3start:
-4start:
-3end:
-4end:
-transposition:
- 
-== add words to the dictionary ==
-it's independant of the user index. So we can add words becoming to several
-fields of several index for example or, why not, to a file with a list of words.
+A  Index (the dictionary) with all the possible words (a lucene index) must be  created.
The structure of this index is (for a 3-4 gram).
+|| Index Structure || Example ||
+|| word || kings ||
+||gram3|| kin, ing, ngs ||
+||gram4|| king, ings||
+||3start|| kin||
+||4start|| king||
+||3end|| ngs||
+||4end|| ings||
 
-we can add all the keywords of a specific field of your index.
-code:
+== add words to the dictionary ==
+we can add words becoming to several fields of several index for example or, why not, to
a file with a list of words.
 
+ *   Example: we can add all the keywords of a specific field of your index.
+ {{{
 SpellChecker spell= new SpellChecker(dictionaryDirectory);
- 
-spell.addWords(myIndex_Reader, myField)
-
+spell.addWords(myIndex_Reader, myField);
+ }}}
 
 == get a list of suggest word ==
-The suggestSimilar method return a list of suggests word sorted by the
-Levenshtein distance and optionaly to the popularity of the word for a specific
-field in a user index. More of that, this list can be restricted only to words
-present in a specific field of a user index.
- 
+The suggestSimilar method return a list of suggests word sorted by:
+  1.   the Levenshtein distance (the closer word is the first of the list).
+  2.   (optionaly) the popularity of the word for a specific field in a user index. 
+
+More of that, this list can be restricted only to words present in a specific field of a
user index.
+
+*   First example: the suggestSimilar(misspelled_word, num_list) method.
+  The "num_list" is the maximum number of words returned. In this example (the simplest)
the list is just sorted with the levenshtein distance.
+{{{
+   String[] l=spellChecker.suggestSimilar("sevanty", 10);
+   //l[0] = "seventy" , l[1] = "seven" , l[2]="seventeen"
+}}}
+
+*   Second example: the suggestSimilar(misspelled_word, num_list, myIndex_Redear,myField,
morePopular)
+
+1.   ""Note"": if myIndex_reader and myField are null this method is the same as the first
method
+2.   The returned words are restricted only to the words presents in the field "myfield"
of the user index "myIndex_Reader"
+3.   the list is sorted with the second criteria
+4.   If "morePopular" is true and the mispelled word exist in the field of the user index
, return only the word more frequent than this.
+
 See the test case code for example 
 
 == Download ==

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message