lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-java Wiki] Update of "SpellChecker" by DanielNaber
Date Tue, 15 May 2007 19:48:20 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The following page has been changed by DanielNaber:
http://wiki.apache.org/jakarta-lucene/SpellChecker

The comment on the change is:
typo and grammar fixes

------------------------------------------------------------------------------
  === SpellChecker ===
  
- A Spell Checker allows to suggest a list of words closed from a misspelled word. This implementation
is based on the David Spencer's code using the n-gram method and the Levensthein distance.
+ A Spell Checker allows to suggest a list of words similar to a misspelled word. This implementation
is based on David Spencer's code using the n-gram method and the Levenshtein distance.
  
  == Structure of a dictionary index ==
- A  Index (the dictionary) with all the possible words (a lucene index) must be  created.
The structure of this index is (for a 3-4 gram).
+ An  index (the dictionary) with all the possible words (a lucene index) must be  created.
The structure of this index is (for a 3-4 gram) this:
  || Index Structure || Example ||
  || word || kings ||
  ||gram3|| kin, ing, ngs ||
@@ -15, +15 @@

  ||end3|| ngs||
  ||end4|| ings||
  
- == Importation: add words to the dictionary ==
+ == Import: Adding Words to the Dictionary ==
- we can add the words coming from a Lucene Index (more precisely a set of Lucene fields),
why not, from a file with a list of words.
+ We can add the words coming from a Lucene Index (more precisely from a set of Lucene fields),
and  from a text file with a list of words.
  
   * Example: we can add all the keywords of a given Lucene field of my index.
   {{{
@@ -24, +24 @@

  spell.indexDictionary(new LuceneDictionary(my_luceneReader,my_fieldname));
   }}}
  
- == get a list of suggested words ==
+ == Getting a List of Suggested Words ==
- The suggestSimilar method return a list of suggested words sorted by:
+ The suggestSimilar method returns a list of suggested words sorted by:
-   1.   the Levenshtein distance (the closest words of the misspelled word is the first of
the list).
+   1.   the Levenshtein distance (the most similar word to the misspelled word is the first
in the list).
-   2.   (optionaly) the popularity of the word in a given Lucene Field.
+   2.   (optionally) the popularity of the word in a given Lucene Field.
  
- furthermore, that list can be restricted only to the words present in a given Lucene Field.
+ Furthermore, that list can be restricted only to the words present in a given Lucene Field.
  
   * First example: the suggestSimilar(misspelled_word, num_list) method.
    The ''num_list'' is the maximum number of words returned.
@@ -39, +39 @@

     //l[0] = "seventy"
   }}}
  
-  * Second example: the suggestSimilar(misspelled_word, num_list, myIndex_Redear,myField,
morePopular)
+  * Second example: the suggestSimilar(misspelled_word, num_list, myIndexReader,myField,
morePopular)
-  ''''Note'''': if myIndex_reader and myField are null this method is the same as the first
method
+  ''''Note'''': if myIndexReader and myField are null this method is the same as the first
method
  
-   1.   The returned words are restricted only to the words presents in the field ''myField''
of the Lucene Index "myIndex_Reader"
+   1.   The returned words are restricted only to the words presents in the field ''myField''
of the Lucene Index "myIndexReader"
-   2.   the list is also sorted with a second criterium: the popularity (the frequence) of
the word in the user field
+   2.   The list is also sorted with a second criterium: the popularity (the frequency) of
the word in the user field
-   3.   If ''morePopular'' is true and the mispelled word exist in the user field , return
only the words more frequent than this.
+   3.   If ''morePopular'' is true and the mispelled word exists in the user field, return
only the words more frequent than this.
  
-  See the test case code for example
+  See the test case code for an example.
- 
  
  == Changes ==
  Version 1.1 :
   * sort fixed (the sort was inversed!)
-  * set gram dynamicaly (depending of the length of the word)
+  * set gram dynamically (depending of the length of the word)
   * use the FuzzyQuery score: ((edit distance)/(length of word))
-  * new Dictionary interface + LuceneDictionary  and PlaintextDictionary implementation
+  * new Dictionary interface + LuceneDictionary and PlaintextDictionary implementation
   * replace addWords method by indexDictionary(Dictionnary dic)
-  * add  a new public method: boolean exist(word)
+  * add a new public method: boolean exist(word)
   * add a build.xml
  
  == Credits ==

Mime
View raw message