lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <jimi.hulleg...@svensktnaringsliv.se>
Subject RE: Can't get spelling suggestions to work properly
Date Fri, 13 Jan 2017 10:55:51 GMT
Hi Alessandro,

Thanks for your explanation. It helped a lot. Although setting "spellcheck.maxResultsForSuggest"
to a value higher than zero was not enough. I also had to set "spellcheck.alternativeTermCount".
With that done, I now get suggestions when searching for 'mycet' (a misspelling of the Swedish
word 'mycket', that didn't return suggestions before).

Although, I'm still not able to fully understand how to configure this properly. Because with
this change there now are other misspelled searches that now longer gives suggestions. The
problem here is stemming, I suspect. Because the main search fields use stemming, so that
in some cases one can get lots of results for spellings that doesn't exist in the index at
all (or, at least not in the spelling-field). How can I configure this component so that those
suggestions are still included? Do I need to set maxResultsForSuggest to a really high number?
Like Integer.MAX_VALUE? I feel that such a setting would defeat the purpose of that parameter,
in a way. But I'm not sure how else to solve this.

Also, there is one other things I wonder about the spelling suggestions, that you might have
the answer to. Is there a way to make the logic case insensitive, but the presentation case
sensitive? For example, a search for 'georg washington' now would return 'george washington'
as a suggestion, but ' Georg Washington' would be even better.

Regards
/Jimi


-----Original Message-----
From: alessandro.benedetti [mailto:abenedetti@apache.org] 
Sent: Thursday, January 12, 2017 5:14 PM
To: solr-user@lucene.apache.org
Subject: Re: Can't get spelling suggestions to work properly

Hi Jimi,
taking a look to the *maxQueryFrequency*  param :

Your understanding is correct.

1) we don't provide misspelled suggestions if we set the param to 1, and we have a minimum
of 1 doc freq for the term .

2) we don't provide misspelled suggestions if the doc frequency of the term is greater than
the max limit set.

Let us explore the code :

if (suggestMode==SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX && docfreq > 0) {
      return new SuggestWord[0];
    }
/// If we are working in "Not in Index Mode" , with a document frequency >0 we get no misspelled
corrections.
/
    
    int maxDoc = ir.maxDoc();
    
    if (maxQueryFrequency >= 1f && docfreq > maxQueryFrequency) {
      return new SuggestWord[0];
    } else if (docfreq > (int) Math.ceil(maxQueryFrequency * (float)maxDoc)) {
      return new SuggestWord[0];
    }
// then the MaxQueryFrequency as you correctly stated enters the game
    
...

Let's explore how you can end up in the first scenario :

if (maxResultsForSuggest == null || hits <= maxResultsForSuggest) {
          SuggestMode suggestMode = SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX;
          if (onlyMorePopular) {
            suggestMode = SuggestMode.SUGGEST_MORE_POPULAR;
          } else if (alternativeTermCount > 0) {
            suggestMode = SuggestMode.SUGGEST_ALWAYS;
          }

You did not set maxResultsForSuggest ( and not onlyMorePopular or alternative term count)
so you ended up in :
SuggestMode suggestMode = SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX;

>From Solr javaDoc :

If left unspecified, the default behavior will prevail.  That is, "correctlySpelled" will
be false and suggestions
   * will be returned only if one or more of the query terms are absent from the dictionary
and/or index.  If set to zero,
   * the "correctlySpelled" flag will be false only if the response returns zero hits.  If
set to a value greater than zero, 
   * suggestions will be returned even if hits are returned (up to the specified number).
 This number also will serve as
   * the threshold in determining the value of "correctlySpelled". 
Specifying a value greater than zero is useful 
   * for creating "did-you-mean" suggestions for queries that return a low number of hits.
   * </p>
   */
  public static final String SPELLCHECK_MAX_RESULTS_FOR_SUGGEST = SPELLCHECK_PREFIX + "maxResultsForSuggest";

You probably want to bypass the other parameters and just set the proper maxResultsForSuggest
param for your spellchecker Cheers



--
View this message in context: http://lucene.472066.n3.nabble.com/Can-t-get-spelling-suggestions-to-work-properly-tp4310079p4313685.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message