lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "SpellCheckComponent" by JamesDyer
Date Wed, 23 May 2012 21:57:51 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SpellCheckComponent" page has been changed by JamesDyer:
http://wiki.apache.org/solr/SpellCheckComponent?action=diff&rev1=55&rev2=56

Comment:
SOLR-2585:  Context-Sensitive Suggestions & Collations

    <str name="buildOnCommit">true</str>
  </lst>
  }}}
- 
  <!> '''NOTE''': Building on commit is very expensive and is discouraged for most production
systems. For large indexes, one commit may take minutes since the building of spellcheck dictionary
is single threaded. Use buildOnOptimize or explicit build instead.
  
  <<Anchor(onOptimize)>>
@@ -186, +185 @@

  {{{
  <str name="buildOnOptimize">true</str>
  }}}
- 
  == thresholdTokenFrequency ==
  For use with IndexBasedSpellChecker or DirectSolrSpellChecker.  This specifies the percentage
of documents in which a term must occur in order to be included in any spelling suggestions.
 (In the case of IndexBasedSpellChecker, only terms that meet this requirement will be indexed
in the spelling dictionary.)  For example, the following configuration line limits the dictionary
to terms that occur in at least 1% of the documents:
  
  {{{
  <float name="thresholdTokenFrequency">.01</float>
  }}}
- 
  Note that this does not affect whether or not a user's query is considered to be correctly
spelled as these spell checkers never offer suggestions for terms included in the full original
documents.  However, specifying thresholdTokenFrequency will prevent low-instance terms from
being offered as spelling suggestions.
  
  = Spell Checking Analysis =
@@ -222, +219 @@

  == spellcheck.count ==
  The maximum number of suggestions to return. Note that this value also limits the number
of candidates considered as suggestions. You might need to increase this value to make sure
you always get the best suggestion, even if you plan to only use the first item in the list.
  
+ == spellcheck.alternativeTermCount ==
+ The maximum number of suggestions to return for terms that exist in the index (Document
Frequency > 0).  Specifying this instructs the spellchecker to try and make suggestions
for every term in the query.  This differs from the "spellcheck.onlyMorePopular" option in
that suggested terms need not be "more popular".  Also, if used with "spellcheck.collate"
collations may be built using the user's original query terms (whereas "spellcheck.onlyMorePopular"
will try to correct every term when building collations).  <!> [[Solr4.0]] See https://issues.apache.org/jira/browse/SOLR-2585
+ 
  == spellcheck.onlyMorePopular ==
  Only return suggestions that result in more hits for the query than the existing query.
Note that even if the given query term is correct (i.e. present in the index), a more popular
suggestion will be returned (if one exists).
  
+ == spellcheck.maxResultsForSuggest ==
+ The maximum number of results the query can return without triggering spelling suggestions
(and collations, if using "spellcheck.collate").  When using "spellcheck.extendedResults",
this value is also the threshold for determining if the "correctlySpelled" flag is false.
 (If "spellcheck.maxResultsForSuggest" is not specified, the default behavior is to generate
suggestions and to report "correctlySpelled" as "false" if at least 1 term is not in the index
(Document Frequency == 0) regardless of the number of results returned.)  This parameter is
especially useful in conjunction with "spellcheck.alternativeTermCount" to generate "Did You
mean?"-style suggestions for low hit-count queries.  <!> [[Solr4.0]] See https://issues.apache.org/jira/browse/SOLR-2585
+ 
  == spellcheck.extendedResults ==
  Provide additional information about the suggestion, such as the frequency in the index.
  
  == spellcheck.collate ==
  Take the best suggestion for each token (if it exists) and construct a new query from the
suggestions. For example, if the input query was "jawa class lording" and the best suggestion
for "jawa" was "java" and "lording" was "loading", then the resulting collation would be "java
class loading". The top suggestions are used and no attempt is made to ensure the collation,
if re-run by the client, will return any results.
  
- <!> [[Solr4.0]] [[Solr3.1]] See [[https://issues.apache.org/jira/browse/SOLR-2010|https://issues.apache.org/jira/browse/SOLR-2010]]
+ <!> [[Solr4.0]] [[Solr3.1]] See https://issues.apache.org/jira/browse/SOLR-2010
  
  spellcheck.collate can guarantee that collations will return results if re-run by the client
(applying original fq params also). This is especially helpful when there is more than one
correction per query.  There is also an option to get multiple collation suggestions and an
expanded response format.  The following three parameters enable this functionality:
  
@@ -265, +268 @@

      </lst>
  </lst>
  }}}
- ----
- See [[https://issues.apache.org/jira/browse/SOLR-2585|https://issues.apache.org/jira/browse/SOLR-2585]]
for a patch that allows Solr to provide context-sensitive spell suggestions and collations.
 With this enabled, Solr will consider replacements for terms even if they exist in the index
and/or dictionary.  This also makes it possible to use a master dictionary containing terms
from multiple fields.
- ----
  == spellcheck.accuracy ==
  <!> [[Solr4.0]] [[Solr3.1]] See https://issues.apache.org/jira/browse/LUCENE-2608
  

Mime
View raw message