lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "SpellCheckComponent" by YonikSeeley
Date Wed, 21 Oct 2009 20:10:26 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SpellCheckComponent" page has been changed by YonikSeeley.
The comment on this change is: fix URLs, update syntax, move some stuff around.
http://wiki.apache.org/solr/SpellCheckComponent?action=diff&rev1=36&rev2=37

--------------------------------------------------

  <!> [[Solr1.3]]
  
- /!\ :TODO: /!\  HOOK in links to Javadocs.
- 
  <<TableOfContents>>
  
  = Introduction =
  
  The SpellCheckComponent is designed to provide inline spell checking of queries without
having to issue separate requests. Another and possibly clearer way of stating this is that
it makes query suggestions (as do well-known web search engines), for example if it thinks
the input query might have been misspelled. (Some people tend to think that "spellchecker"
is actually a misnomer, and something along the lines of "query suggest" would have been more
appropriate.)
- 
- For discussion of the development of this feature, see [[https://issues.apache.org/jira/browse/SOLR-572|SOLR-572]].
  
  The SpellCheckComponent can use the [[http://wiki.apache.org/jakarta-lucene/SpellChecker|Lucene
SpellChecker]] to give suggestion for given words, or one can implement their own spell checker
using the SolrSpellChecker abstract base class.
  
@@ -92, +88 @@

  
  }}}
  
- When adding <str name="field">FieldName</str> be aware all fieldType processing
is done prior to the dictionary creation.  It is best to avoid a heavily processed field (ie
synonyms and stemming) to get more accurate results.  If the field has many word variations
from processing then the dictionary will be created with those in addition to more valid spell
checking data.
+ When adding {{{<str name="field">FieldName</str>}}} be aware all fieldType processing
is done prior to the dictionary creation.  It is best to avoid a heavily processed field (ie
synonyms and stemming) to get more accurate results.  If the field has many word variations
from processing then the dictionary will be created with those in addition to more valid spell
checking data.
  
  Multiple "spellchecker" instances can be configured in the same way. The currently available
spellchecker implementations are:
   * org.apache.solr.spelling.IndexBasedSpellChecker -- Create and use a spelling dictionary
that is based on the Solr index or an existing Lucene index
@@ -159, +155 @@

  A simple result using the spellcheck.q parameter. Note the spellcheck.build=true which is
needed only once to build the index. It should not be specified with for each request.
  
  {{{
- http://localhost:8983/solr/spellCheckCompRH?q=*:*&spellcheck.q=hell%20ultrashar&spellcheck=true&spellcheck.build=true
+ http://localhost:8983/solr/spell?q=*:*&spellcheck.build=true&spellcheck.q=hell%20ultrashar&spellcheck=true
  }}}
  
  {{{
@@ -189, +185 @@

  
  The spellcheck.extendedResults=true parameter provides frequency of each original term in
the index (origFreq) as well as the frequency of each suggestion in the index (frequency).
  
- '''''NOTE''': This result format differs from the non-extended one as the returned suggestions
is actually an array of lists, where each list holds the suggested term and its frequency.''
<!> [[Solr1.4]]
+ '''''NOTE''': This result format differs from the non-extended one as the returned suggestion
for a word is actually an array of lists, where each list holds the suggested term and its
frequency.'' <!> [[Solr1.4]]
  
  {{{
- http://localhost:8983/solr/spellCheckCompRH?q=*:*&spellcheck.q=hell+ultrashar&spellcheck=true&spellcheck.extendedResults=true
+ http://localhost:8983/solr/spell?q=*:*&spellcheck.q=hell+ultrashar&spellcheck=true&spellcheck.extendedResults=true
  }}}
  
  {{{
  <lst name="spellcheck">
- 	<lst name="suggestions">
+  <lst name="suggestions">
- 		<lst name="hell">
+   <lst name="hell">
- 			<int name="numFound">1</int>
+ 	<int name="numFound">1</int>
- 			<int name="startOffset">0</int>
+ 	<int name="startOffset">0</int>
- 			<int name="endOffset">4</int>
+ 	<int name="endOffset">4</int>
- 			<int name="origFreq">0</int>
+ 	<int name="origFreq">0</int>
- 			<arr name="suggestion">
+ 	<arr name="suggestion">
-                                 <lst>
- 				        <int name="frequency">1</int>
+ 	 <lst>
+ 
- 				        <str name="word">dell</str>
+ 	  <str name="word">dell</str>
+ 	  <int name="freq">2</int>
-                                 </lst>
- 			</arr>
- 		</lst>
+ 	 </lst>
+ 	</arr>
+   </lst>
- 		<lst name="ultrashar">
+   <lst name="ultrashar">
- 			<int name="numFound">1</int>
+ 	<int name="numFound">1</int>
+ 
- 			<int name="startOffset">5</int>
+ 	<int name="startOffset">5</int>
- 			<int name="endOffset">14</int>
+ 	<int name="endOffset">14</int>
- 			<int name="origFreq">0</int>
+ 	<int name="origFreq">0</int>
- 			<arr name="suggestion">
+ 	<arr name="suggestion">
+ 	 <lst>
-                                 <lst>
- 				        <int name="frequency">1</int>
- 				        <str name="word">ultrasharp</str>
+ 	  <str name="word">ultrasharp</str>
-                                 </lst>
- 			</arr>
+ 	  <int name="freq">2</int>
+ 
- 		</lst>
+ 	 </lst>
+ 	</arr>
+   </lst>
- 		<bool name="correctlySpelled">false</bool>
+   <bool name="correctlySpelled">false</bool>
- 	</lst>
+  </lst>
  </lst>
  }}}
  
@@ -232, +231 @@

  Adding the spellcheck.collate=true parameter returns a query with the misspelled terms replaced
by the top suggestions. Note that the non-spellcheckable terms such as those for range queries,
prefix queries etc. are detected and excluded for spellchecking. Such non-spellcheckable terms
are preserved in the collated output so that the original query can be run again, as is.
  
  {{{
- http://localhost:8983/solr/spellCheckCompRH?q=price:[80 TO 100] hell ultrashar&spellcheck=true&spellcheck.extendedResults=true&spellcheck.collate=true
+ http://localhost:8983/solr/spell?q=price:[80 TO 100] hell ultrashar&spellcheck=true&spellcheck.extendedResults=true&spellcheck.collate=true
  }}}
  
  {{{
  <lst name="spellcheck">
- 	<lst name="suggestions">
+  <lst name="suggestions">
- 		<lst name="hell">
+   <lst name="hell">
- 			<int name="numFound">1</int>
+ 	<int name="numFound">1</int>
- 			<int name="startOffset">18</int>
+ 	<int name="startOffset">18</int>
- 			<int name="endOffset">22</int>
+ 	<int name="endOffset">22</int>
- 			<int name="origFreq">0</int>
+ 	<int name="origFreq">0</int>
- 			<lst name="suggestion">
+ 	<arr name="suggestion">
- 				<int name="frequency">1</int>
+ 	 <lst>
- 				<str name="word">dell</str>
+ 	  <str name="word">dell</str>
- 			</lst>
+ 	  <int name="freq">2</int>
- 		</lst>
+ 	 </lst>
+ 	</arr>
+   </lst>
- 		<lst name="ultrashar">
+   <lst name="ultrashar">
- 			<int name="numFound">1</int>
+ 	<int name="numFound">1</int>
- 			<int name="startOffset">23</int>
+ 	<int name="startOffset">23</int>
- 			<int name="endOffset">32</int>
+ 	<int name="endOffset">32</int>
- 			<int name="origFreq">0</int>
+ 	<int name="origFreq">0</int>
- 			<lst name="suggestion">
+ 	<arr name="suggestion">
- 				<int name="frequency">1</int>
+ 	 <lst>
- 				<str name="word">ultrasharp</str>
+ 	  <str name="word">ultrasharp</str>
- 			</lst>
+ 	  <int name="freq">2</int>
- 		</lst>
+ 	 </lst>
+ 	</arr>
+   </lst>
- 		<bool name="correctlySpelled">false</bool>
+   <bool name="correctlySpelled">false</bool>
- 		<str name="collation">price:[80 TO 100] dell ultrasharp</str>
+   <str name="collation">price:[80 TO 100] dell ultrasharp</str>
- 	</lst>
+  </lst>
  </lst>
  }}}
  
- = Implementing a SolrSpellChecker =
+ = Implementing a new java SolrSpellChecker =
+ 
+ /!\ :TODO: /!\  HOOK in links to Javadocs.
  
  The SolrSpellChecker class provides an abstract base class for defining common spelling
constructs for use in the SpellCheckComponent.  Implementing
  classes need to define the following methods:
@@ -309, +314 @@

  <str name="buildOnOptimize">true</str>
  }}}
  
+ = History =
+ For discussion of the development of this feature, see [[https://issues.apache.org/jira/browse/SOLR-572|SOLR-572]].
+ 

Mime
View raw message