lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lucene-...@jakarta.apache.org
Subject [Jakarta Lucene Wiki] Updated: SpellChecker
Date Fri, 22 Oct 2004 02:10:49 GMT
   Date: 2004-10-21T19:10:49
   Editor: NicolasMaisonneuve <nicoo_@hotmail.com>
   Wiki: Jakarta Lucene Wiki
   Page: SpellChecker
   URL: http://wiki.apache.org/jakarta-lucene/SpellChecker

   no comment

Change Log:

------------------------------------------------------------------------------
@@ -1,6 +1,6 @@
 === SpellChecker ===
 
-A Spell Checker allow to suggest a list of words close to a misspelled word. This implementation
is based on the David Spencer code using the n-gram technic and the levensthein distance.

+A Spell Checker allow to suggest a list of words close to a misspelled word. This implementation
is based on the David Spencer code using the n-gram technic and the levensthein distance.
 
 == Structure of a dictionary index ==
 A  Index (the dictionary) with all the possible words (a lucene index) must be  created.
The structure of this index is (for a 3-4 gram).
@@ -25,12 +25,12 @@
 == get a list of suggest words ==
 The suggestSimilar method return a list of suggests word sorted by:
   1.   the Levenshtein distance (the closer word of the misspelled word is the first of the
list).
-  2.   (optionaly) the popularity of the word for a specific field in a user index. 
+  2.   (optionaly) the popularity of the word for a specific field in a user index.
 
 More of that, this list can be restricted only to words present in a specific field of a
user index.
 
  * First example: the suggestSimilar(misspelled_word, num_list) method.
-  The ''num_list'' is the maximum number of words returned. 
+  The ''num_list'' is the maximum number of words returned.
   In this example the list is just sorted with the levenshtein distance.
  {{{
    String[] l=spellChecker.suggestSimilar("sevanty", 2);
@@ -41,26 +41,26 @@
  ''''Note'''': if myIndex_reader and myField are null this method is the same as the first
method
 
   1.   The returned words are restricted only to the words presents in the field ''myField''
of the user index "myIndex_Reader"
-  2.   the list is sorted with also a second criteria : the popularity (the frequence) of
the word in the user field 
+  2.   the list is sorted with also a second criteria : the popularity (the frequence) of
the word in the user field
   3.   If ''morePopular'' is true and the mispelled word exist in the user field , return
only the words more frequent than this.
 
- See the test case code for example 
+ See the test case code for example
 
 == Download ==
 attachment:spellchecker1.1.zip
 
-== Changes == 
-1.1 
-- sort fixed (the sort was inversed!) 
-- set gram dynamicaly (depending of the length of the word) 
-- use the FuzzyQuery score: ((edit distance)/(length of word))
-- new Dictionary interface + LuceneDictionary  and PlaintextDictionary implementation
-- replace addWords method by indexDictionary(Dictionnary dic)
-- add  a new public method: boolean exist(word) 
-- add a build.xml
+
+== Changes ==
+Version 1.1 :
+ * sort fixed (the sort was inversed!)
+ * set gram dynamicaly (depending of the length of the word)
+ * use the FuzzyQuery score: ((edit distance)/(length of word))
+ * new Dictionary interface + LuceneDictionary  and PlaintextDictionary implementation
+ * replace addWords method by indexDictionary(Dictionnary dic)
+ * add  a new public method: boolean exist(word)
+ * add a build.xml
 
 == Credits ==
  *   Maisonneuve Nicolas
  *   Spencer David
-
 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message