lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dyer, James" <James.D...@ingrambook.com>
Subject RE: Improving Solr Spell Checker Results
Date Mon, 16 Jan 2012 18:32:50 GMT
David,

The spellchecker normally won't give suggestions for any term in your index.  So even if "wever"
is misspelled in context, if it exists in the index the spell checker will not try correcting
it.  There are 3 workarounds:
1. Use the patch included with SOLR-2585 (this is for Trunk/4.x only).  See https://issues.apache.org/jira/browse/SOLR-2585

2. try "onlyMorePopular=true" in your request.  (http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.onlyMorePopular).
 But see the September 2, 2011 comment in SOLR-2585 about why this might not do what you'd
hope it would.

3. If you're building your index on a <copyField />, you can add a stopword filter that
filters out all of the misspelt or rare words from the field that the dictionary is based.
 This could be an arduous task, and it may or may not work well for your data.

As for your second question, I take it you're using (e)dismax with multiple fields in "qf",
right?  The only way I know to handle this is to create a <copyfield> that combines
all of the fields you search across.  Use this combined field to base your dictionary.  Also,
specifying "spellcheck.maxCollationTries" with a non-zero value will weed out the nonsense
word combinations that are likely to occur when doing this, ensuring that any collations provided
will indeed yield hits.  The downside to doing this, of course, is it will make your first
problem more acute in that there will be even more terms in your index that the spellchecker
will ignore entirely, even if they're mispelled in context.  Once again, SOLR-2585 is designed
to tackle this problem but it is still in its early stages, and thus far it is Trunk-only.

You might also be interested in https://issues.apache.org/jira/browse/SOLR-2993 .  Although
this is unrelated to your two questions, the patch on this issue introduces a new "ConjunctionSolrSpellChecker"
which theoretically could be enhanced to do exactly what you want.  That is, you could (theoretically)
create separate dictionaries for each of the fields you're searching and let the CSSC combine
the results & generate collations, etc. 

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: David Radunz [mailto:david@boxen.net] 
Sent: Friday, January 13, 2012 11:42 PM
To: solr-user@lucene.apache.org
Subject: Improving Solr Spell Checker Results

Hey,

     Firstly I would like to thank you all for creating such a great 
searching platform. What I was wondering is whether it is possible to:

1. Have the spell checker take into account multiple words. For example 
if I search for "Sigourney Wever" it doesn't flag as a spelling issue as 
'wever' is a correctly spelled word. And if I searched for "Sigourney 
Wevr" the suggestion is "Sigourney Wever". Of course the correct 
spelling is: Sigourney Weaver
2. Have the spell checker return corrections only for dictionary items 
added on the field being searched. i.e. Searching for an actor would 
only use the dictionary fields from the actor. This makes sense on many 
levels, as when you are field searching its useless to get a correction 
from another field as no values would match in any case.

Hopefully someone can help!

Thanks in advance,

David

Mime
View raw message