lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shalin Shekhar Mangar (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-1676) spellcheck.count has confusing default and documentation
Date Mon, 21 Dec 2009 11:19:18 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793158#action_12793158
] 

Shalin Shekhar Mangar commented on SOLR-1676:
---------------------------------------------

Although it is not documented anywhere, SpellCheckComponent passes max(spellcheck.count, 5)
to the Lucene spellchecker, see AbstractLuceneSpellChecker line 141 in trunk.

bq. The effect is that with a low value for spellcheck.count you might miss good hits. In
other words, the first item with spellcheck.count==1 is not always the same item as with e.g.
spellcheck.count==10. 

That is true. It is a trade-off between accuracy and performance. We cannot avoid this without
fetching all results (or a large number of them) internally and score all of them with a distance
metric and that can make it very slow.

Do you have any suggestion on how we could improve the documentation?



> spellcheck.count has confusing default and documentation
> --------------------------------------------------------
>
>                 Key: SOLR-1676
>                 URL: https://issues.apache.org/jira/browse/SOLR-1676
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 1.4
>            Reporter: Daniel Naber
>            Priority: Minor
>
> It seems spellcheck.count does not just limit the number of results returned, as the
documentation claims. Instead, this value is given to the Lucene SpellChecker class which
multiplies it by 10 and then only fetches the first spellcheck.count*10 candidates, ignoring
all others. The effect is that with a low value for spellcheck.count you might miss good hits.
In other words, the first item with spellcheck.count==1 is not always the same item as with
e.g. spellcheck.count==10.
> The fix could be to fix the documentation (the comments in the sample solrconfig.xml)
to mention this and use a better default.
> The Lucene SpellChecker class says about the numSug parameter: "Thus, you should set
this value to *at least* 5 for a good suggestion."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message