lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Solr Wiki] Update of "Suggester" by AndrzejBialecki
Date Mon, 27 Sep 2010 21:13:53 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "Suggester" page has been changed by AndrzejBialecki.


        <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
        <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
        <str name="field">suggest</str>
+       <str name="threshold">0.005</str>
        <str name="buildOnCommit">true</str>
        <str name="sourceLocation">american-english</str>
@@ -34, +35 @@

  The look-up of matching suggestions in a dictionary is implemented by subclasses of the
Lookup class - there are two implementations that are included in Solr, both are based on
in-memory tries: JaspellLookup and TSTLookup. Benchmarks indicate that TSTLookup provides
better performance at a lower memory cost (roughly 50% faster and 50% of memory cost) - however,
JaspellLookup can provide "fuzzy" suggestions, though this functionality is not currently
exposed (it's a one line change in JaspellLookup).
- == Configuration ==
+ = Configuration =
  The configuration snippet above shows a few common configuration parameters. Here's a complete
list of them:
- === SpellCheckComponent configuration ===
+ == SpellCheckComponent configuration ==
  * `searchComponent/@name` - arbitrary name for this component
@@ -50, +51 @@

    * `buildOnCommit` - if set to true then the Lookup data structure will be rebuilt after
commit. If false (default) then the Lookup data will be built only when requested (by URL
parameter ``). '''NOTE: currently implemented Lookup-s keep their data
in memory, so unlike spellchecker data this data is discarded on core reload and not available
until you invoke the build command, either explicitly or implicitly via commit.'''
    * `location` - location of the dictionary file. If not empty then this is a path to a
dictionary file (see below). If this value is empty then the main index will be used as a
source of terms and weights.
    * `field` - if `location` is empty then terms from this field in the index will be used
when building the trie.
+   * `threshold` - threshold is a value in [0..1] representing the minimum fraction of documents
(of the total) where a term should appear, in order to be added to the lookup dictionary.
+ == Dictionary file ==
+ It's a plain text file in UTF-8 encoding. Blank lines and lines that start with a '#' are
ignored. The remaining lines must consist of either a string without literal TAB (\u0007)
character, or a string and a TAB separated floating-point weight.
+ Example:
+ {{{
+ # This is a sample dictionary file.
+ acquire
+ accidentally\t2.0
+ accommodate\t3.0
+ }}}
+ If weight is missing it's assumed to be equal 1.0.
+ Please note that the format of the file is not limited to single terms but can also contain
phrases - which is an improvement over the TermsComponent that you could also use for a simple
version of autocomplete functionality.

View raw message