lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Dyer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-2571) IndexBasedSpellChecker "thresholdTokenFrequency" fails with a ClassCastException on startup
Date Mon, 06 Jun 2011 15:48:59 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044917#comment-13044917
] 

James Dyer commented on SOLR-2571:
----------------------------------

{quote}
what makes this 'decision' of correctlySpelled? Do you know?
{quote}

I took a quick look to find out.  Its more complicated than I thought!  Here's the basic jist
(I think!) :
 - If the instance of SolrSpellChecker returns frequency data and all suggestions have frequency
>0, TRUE.
 - If the instance of SolrSpellChecker returns frequency data and any suggestion have frequency
== 0, FALSE.
 - If the instance of SolrSpellChecker returns NO frequency data but has suggestions, OMIT.
 - If the instance of SolrSpellChecker returns NO suggestions, FALSE. 

Possibly this isn't fully accurate but I'm at least mostly correct here.  Seems like the discrepency
with DirectSolrSpellChecker is because it isn't returning Frequency info?

This all happens in SpellCheckComponent.toNamedList() ... I'm guessing the code here uses
the presence or absence of frequency data as kind of a proxy indicator whether or not its
dealing with IndexBasedSpellChecker or FileBasedSpellChecker.  Possibly it would be better
if each instance of SolrSpellChecker had a "isCorrectlySpelled()" method that toNamedList()
could call?  Maybe I should I go open another jira issue for that?


> IndexBasedSpellChecker "thresholdTokenFrequency" fails with a ClassCastException on startup
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-2571
>                 URL: https://issues.apache.org/jira/browse/SOLR-2571
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 1.4.1, 3.1, 4.0
>            Reporter: James Dyer
>            Priority: Minor
>              Labels: whereIsHossManWhenYouNeedHim
>             Fix For: 3.3, 4.0
>
>         Attachments: SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.solr3.2.patch
>
>
> When parsing the configuration for thresholdTokenFrequency", the IndexBasedSpellChecker
tries to pull a Float from the DataConfig.xml-derrived NamedList.  However, this comes through
as a String.  Therefore, a ClassCastException is always thrown whenever this parameter is
specified.  The code ought to be doing "Float.parseFloat(...)" on the value.
> This looks like a nice feature to use in cases the data contains misspelled or rare words
leading to spurious "correct" queries.  I would have liked to have used this with a project
we just completed however this bug prevented that.  This issue came up recently in the User's
mailing list so I am raising an issue now.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message