lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dyer, James" <James.D...@ingramcontent.com>
Subject RE: Can't get spelling suggestions to work properly
Date Tue, 17 Jan 2017 18:06:35 GMT
Jimi,

Generally speaking, spellcheck does not work well against fields with stemming, or other "heavy"
analysis.  I would <copyField /> to a field that is tokenized on whitespace with little
else, and use that field for spellcheck.

By default, the spellchecker does not suggest for words in the index.  So if the user misspells
a word but the misspelling is actually some other word that is indexed, it will never suggest.
 You can orverride this behavior by specifying  "spellcheck.alternativeTermCount" with a value
>0.  This is how many suggestions it should give for words that indeed exist in the index.
 This can be the same value as "spellcheck.count", but you may wish to set it to a lower value.

I do not recommend using "spellcheck.onlyMorePopular".  It is similar to "spellcheck.alternativeTermCount",
but in my opinion, the later gives a better experience.

You might also wish to set "spellcheck.maxResultsForSuggest".  If you set this, then the spellchecker
will not suggest anything if more results are returned than the value you specify.  This is
helpful in providing "did you mean"-style suggestions for queries that return few results.

If you would like to ensure the suggestions combine nicely into a re-written query that returns
results, then specify both "spellcheck.collate=true" and "spellcheck.maxCollationTries" to
a value >0 (possibly 5-10).  This will cause it to internally check the re-written queries
(aka. Collations) and report back on how many results you get for each.  If you are using
"q.op=OR" or a low value for "mm", then you will likely want to override this with something
like "spellcheck.collateParam.mm=0".  Otherwise every combination will get reported as returning
results.

I hope this and other comments you've gotten helps demystify spellcheck configuration.  I
do agree it is fairly complicated and frustrating to get it just right.

James Dyer
Ingram Content Group

-----Original Message-----
From: jimi.hullegard@svensktnaringsliv.se [mailto:jimi.hullegard@svensktnaringsliv.se] 
Sent: Friday, January 13, 2017 5:16 AM
To: solr-user@lucene.apache.org
Subject: RE: Can't get spelling suggestions to work properly

I just noticed why setting maxResultsForSuggest to a high value was not a good thing. Because
now it show spelling suggestions even on correctly spelled words.

I think, what I would need is the logic of SuggestMode. SUGGEST_WHEN_NOT_IN_INDEX, but with
a configurable limit instead of it being hard coded to 0. Ie just as maxQueryFrequency works.

/Jimi

-----Original Message-----
From: jimi.hullegard@svensktnaringsliv.se [mailto:jimi.hullegard@svensktnaringsliv.se] 
Sent: Friday, January 13, 2017 5:56 PM
To: solr-user@lucene.apache.org
Subject: RE: Can't get spelling suggestions to work properly

Hi Alessandro,

Thanks for your explanation. It helped a lot. Although setting "spellcheck.maxResultsForSuggest"
to a value higher than zero was not enough. I also had to set "spellcheck.alternativeTermCount".
With that done, I now get suggestions when searching for 'mycet' (a misspelling of the Swedish
word 'mycket', that didn't return suggestions before).

Although, I'm still not able to fully understand how to configure this properly. Because with
this change there now are other misspelled searches that now longer gives suggestions. The
problem here is stemming, I suspect. Because the main search fields use stemming, so that
in some cases one can get lots of results for spellings that doesn't exist in the index at
all (or, at least not in the spelling-field). How can I configure this component so that those
suggestions are still included? Do I need to set maxResultsForSuggest to a really high number?
Like Integer.MAX_VALUE? I feel that such a setting would defeat the purpose of that parameter,
in a way. But I'm not sure how else to solve this.

Also, there is one other things I wonder about the spelling suggestions, that you might have
the answer to. Is there a way to make the logic case insensitive, but the presentation case
sensitive? For example, a search for 'georg washington' now would return 'george washington'
as a suggestion, but ' Georg Washington' would be even better.

Regards
/Jimi


-----Original Message-----
From: alessandro.benedetti [mailto:abenedetti@apache.org] 
Sent: Thursday, January 12, 2017 5:14 PM
To: solr-user@lucene.apache.org
Subject: Re: Can't get spelling suggestions to work properly

Hi Jimi,
taking a look to the *maxQueryFrequency*  param :

Your understanding is correct.

1) we don't provide misspelled suggestions if we set the param to 1, and we have a minimum
of 1 doc freq for the term .

2) we don't provide misspelled suggestions if the doc frequency of the term is greater than
the max limit set.

Let us explore the code :

if (suggestMode==SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX && docfreq > 0) {
      return new SuggestWord[0];
    }
/// If we are working in "Not in Index Mode" , with a document frequency >0 we get no misspelled
corrections.
/
    
    int maxDoc = ir.maxDoc();
    
    if (maxQueryFrequency >= 1f && docfreq > maxQueryFrequency) {
      return new SuggestWord[0];
    } else if (docfreq > (int) Math.ceil(maxQueryFrequency * (float)maxDoc)) {
      return new SuggestWord[0];
    }
// then the MaxQueryFrequency as you correctly stated enters the game
    
...

Let's explore how you can end up in the first scenario :

if (maxResultsForSuggest == null || hits <= maxResultsForSuggest) {
          SuggestMode suggestMode = SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX;
          if (onlyMorePopular) {
            suggestMode = SuggestMode.SUGGEST_MORE_POPULAR;
          } else if (alternativeTermCount > 0) {
            suggestMode = SuggestMode.SUGGEST_ALWAYS;
          }

You did not set maxResultsForSuggest ( and not onlyMorePopular or alternative term count)
so you ended up in :
SuggestMode suggestMode = SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX;

>From Solr javaDoc :

If left unspecified, the default behavior will prevail.  That is, "correctlySpelled" will
be false and suggestions
   * will be returned only if one or more of the query terms are absent from the dictionary
and/or index.  If set to zero,
   * the "correctlySpelled" flag will be false only if the response returns zero hits.  If
set to a value greater than zero, 
   * suggestions will be returned even if hits are returned (up to the specified number).
 This number also will serve as
   * the threshold in determining the value of "correctlySpelled". 
Specifying a value greater than zero is useful 
   * for creating "did-you-mean" suggestions for queries that return a low number of hits.
   * </p>
   */
  public static final String SPELLCHECK_MAX_RESULTS_FOR_SUGGEST = SPELLCHECK_PREFIX + "maxResultsForSuggest";

You probably want to bypass the other parameters and just set the proper maxResultsForSuggest
param for your spellchecker Cheers



--
View this message in context: http://lucene.472066.n3.nabble.com/Can-t-get-spelling-suggestions-to-work-properly-tp4310079p4313685.html
Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message