lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Radunz <>
Subject Re: Improving Solr Spell Checker Results
Date Mon, 23 Jan 2012 01:54:06 GMT
Hey Erick,

     Sure, can you explain the process to create the patch and upload it 
and i'll do it first thing tomorrow.

Thanks again for your help,


On 23/01/2012 12:51 PM, Erick Erickson wrote:
> I can't help with your *real* problem, but when looking at patches,
> if the "resolution" field isn't set to something like "fixed" it means
> that the patch has NOT  been applied to any code lines. There
> also should be commit revisions specified in the comments.
> If "Fix Versions" has values, that doesn't mean the patch has
> been applied either, that's often just a statement of where
> the patch *should* go.
> And, between the time someone uploads a patch and it actually
> gets *committed*, the underlying code line can, indeed,  change
> and the patch doesn't apply cleanly. Since you've already had
> to do this, could you upload your version that *does* apply
> cleanly?
> Best
> Erick
> On Sun, Jan 22, 2012 at 2:56 AM, David Radunz<>  wrote:
>> James,
>>     I worked out that I actually needed to 'apply' patch SOLR-2585, whoops.
>> So I have done that now and it seems to return 'correctlySpelled=true' for
>> 'Sigorney Wever' (when Sigorney isn't even in the dictionary). Could
>> something have changed in the trunk to make your patch no longer work? I had
>> to manually merge the setup for the test case due to a new 'hyphens' test
>> case. The settings I am use are:
>> <lst name="defaults">
>> <str name="echoParams">explicit</str>
>> <int name="rows">10</int>
>> <str name="spellcheck.onlyMorePopular">false</str>
>> <int name="spellcheck.count">10</int>
>> <str name="spellcheck.extendedResults">true</str>
>> <str name="spellcheck.collate">true</str>
>> <str name="spellcheck.collateExtendedResults">true</str>
>> <int name="spellcheck.maxCollationTries">10</int>
>> <int name="spellcheck.maxCollations">1</int>
>> <int name="spellcheck.alternativeTermCount">5</int>
>> <int name="spellcheck.maxResultsForSuggest">1</int>
>> </lst>
>> <lst name="spellchecker">
>> <str name="name">default</str>
>> <str name="field">spell</str>
>> <str name="classname">solr.DirectSolrSpellChecker</str>
>> <!-- the spellcheck distance measure used, the default is the internal
>> levenshtein -->
>> <str name="distanceMeasure">internal</str>
>> <!-- minimum accuracy needed to be considered a valid spellcheck suggestion
>> -->
>> <float name="accuracy">0.5</float>
>> <!-- the maximum #edits we consider when enumerating terms: can be 1 or 2
>> -->
>> <int name="maxEdits">2</int>
>> <!-- the minimum shared prefix when enumerating terms -->
>> <int name="minPrefix">1</int>
>> <!-- maximum number of inspections per result. -->
>> <int name="maxInspections">5</int>
>> <!-- minimum length of a query term to be considered for correction -->
>> <int name="minQueryLength">4</int>
>> <!-- maximum threshold of documents a query term can appear to be considered
>> for correction -->
>> <float name="maxQueryFrequency">0.01</float>
>> <!-- require suggestions to occur in 0.1% of the documents -->
>> <!--
>> <float name="thresholdTokenFrequency">0.001</float>
>>       -->
>> <str name="spellcheckIndexDir">spellchecker</str>
>> <str name="buildOnCommit">true</str>
>> </lst>
>> With the query:
>> spellcheck=true&facet=on&fl=id,sku,name,format,thumbnail,release_date,url_path,price,special_price,year_made_attr_opt_combo,primary_cat_id&sort=score+desc,name+asc,year_made+desc&start=0&q=sigorney+wever+title:"sigorney+wever"^100+series_name:"sigorney+wever"^50&spellcheck.q=sigorney+wever&fq=store_id:"1"&rows=5
>> Cheers,
>> David
>> On 22/01/2012 2:03 AM, David Radunz wrote:
>>> James,
>>>     Thanks again for your lengthy and informative response. I updated from
>>> SVN trunk again today and was successfully able to run 'ant test'. So I
>>> proceeded with trying your suggestions (for question 1 so far):
>>> On 17/01/2012 5:32 AM, Dyer, James wrote:
>>>> David,
>>>> The spellchecker normally won't give suggestions for any term in your
>>>> index.  So even if "wever" is misspelled in context, if it exists in the
>>>> index the spell checker will not try correcting it.  There are 3
>>>> workarounds:
>>>> 1. Use the patch included with SOLR-2585 (this is for Trunk/4.x only).
>>>>   See
>>> I have tried using this with the original test case of 'Signorney Wever'.
>>> I didn't notice any difference, although I am a little unclear as to what
>>> exactly this patch does. Nor am I really clear what to set either of the
>>> options to, so I set them both to '5'. I tried to find the test case it
>>> mentions, but it's not present in .. Any
>>> suggestions?
>>>> 2. try "onlyMorePopular=true" in your request.
>>>>   (
>>>>   But see the September 2, 2011 comment in SOLR-2585 about why this might
>>>> do what you'd hope it would.
>>> Trying this did produce 'Signourney Weaver' as you would hope, but I am a
>>> little afraid of the downside. I would much more like a context sensative
>>> spell check that involves the terms around the correction.
>>>> 3. If you're building your index on a<copyField />, you can add a
>>>> stopword filter that filters out all of the misspelt or rare words from the
>>>> field that the dictionary is based.  This could be an arduous task, and it
>>>> may or may not work well for your data.
>>> I am currently using a copyField for all terms that are relevant, which is
>>> quite a lot and the dictionary would encompass a huge amount of data. Adding
>>> stopword filters would be out of the question as we presently have more than
>>> 30,000 products and this is for the initial launch, we intend to have many
>>> many more.
>>>> As for your second question, I take it you're using (e)dismax with
>>>> multiple fields in "qf", right?  The only way I know to handle this is to
>>>> create a<copyfield>    that combines all of the fields you search across.
>>>> this combined field to base your dictionary.  Also, specifying
>>>> "spellcheck.maxCollationTries" with a non-zero value will weed out the
>>>> nonsense word combinations that are likely to occur when doing this,
>>>> ensuring that any collations provided will indeed yield hits.  The downside
>>>> to doing this, of course, is it will make your first problem more acute in
>>>> that there will be even more terms in your index that the spellchecker will
>>>> ignore entirely, even if they're mispelled in context.  Once again,
>>>> SOLR-2585 is designed to tackle this problem but it is still in its early
>>>> stages, and thus far it is Trunk-only.
>>> I tried setting spellcheck.maxCollationTries to 5 to see if it would help
>>> with the above problem, but it did not.
>>> I have now tried using it in the context of question 2. I tried searching
>>> for 'Sigorney Wever' in the series name (which it's not present in, as its
>>> an actor):
>>> spellcheck=true&facet=on&fl=id,sku,name,format,thumbnail,release_date,url_path,price,special_price,year_made_attr_opt_combo,series_name_attr_opt_combo&sort=score+desc,release_date+desc&start=0&q=*+series_name:"signourney+wever"^100&spellcheck.q=signourney+wever&fq=store_id:"1"+AND+series_name_attr_opt_search:*signourney*wever*&rows=5&spellcheck.maxCollationTries=5
>>> Suggestions for 'Sigourney' Wever were returned, but no spelling
>>> suggestions or ones for series names (which i doubt there would be) should
>>> have been returned.
>>>> You might also be interested in
>>>> .  Although this is
>>>> unrelated to your two questions, the patch on this issue introduces a new
>>>> "ConjunctionSolrSpellChecker" which theoretically could be enhanced to do
>>>> exactly what you want.  That is, you could (theoretically) create separate
>>>> dictionaries for each of the fields you're searching and let the CSSC
>>>> combine the results&    generate collations, etc.
>>> During the upgrade I switched to solr.DirectSolrSpellChecker, which I
>>> presume will help with this? I am a senior developer (in
>>> Java/Perl/Python/PHP) but I have not as yet looked at any of the Solr source
>>> code. So I am in the dark when you say it could be tailored for my needs.
>>> Also, how would it work? Query wise.. Would it be like..
>>> spellcheck.series_name.q= and and so on? If so that
>>> sounds tempting to try and achieve. But if you could provide any pointers in
>>> what exactly would be required that would really help.
>>> Thanks again for your time,
>>> David
>>>> James Dyer
>>>> E-Commerce Systems
>>>> Ingram Content Group
>>>> (615) 213-4311
>>>> -----Original Message-----
>>>> From: David Radunz []
>>>> Sent: Friday, January 13, 2012 11:42 PM
>>>> To:
>>>> Subject: Improving Solr Spell Checker Results
>>>> Hey,
>>>>       Firstly I would like to thank you all for creating such a great
>>>> searching platform. What I was wondering is whether it is possible to:
>>>> 1. Have the spell checker take into account multiple words. For example
>>>> if I search for "Sigourney Wever" it doesn't flag as a spelling issue as
>>>> 'wever' is a correctly spelled word. And if I searched for "Sigourney
>>>> Wevr" the suggestion is "Sigourney Wever". Of course the correct
>>>> spelling is: Sigourney Weaver
>>>> 2. Have the spell checker return corrections only for dictionary items
>>>> added on the field being searched. i.e. Searching for an actor would
>>>> only use the dictionary fields from the actor. This makes sense on many
>>>> levels, as when you are field searching its useless to get a correction
>>>> from another field as no values would match in any case.
>>>> Hopefully someone can help!
>>>> Thanks in advance,
>>>> David

View raw message