lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Dyer (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-2993) Integrate WordBreakSpellChecker with Solr
Date Mon, 09 Jan 2012 16:00:40 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182577#comment-13182577
] 

James Dyer commented on SOLR-2993:
----------------------------------

Okke,

Thanks for looking at this patch.  Here are a few comments:

{quote}
if both word parts resulted in suggestions, the collation made no sense.
{quote}
This is a problem with collations in general:  By default, it simply mashes the top corrections
together, often resulting in nonsense.  The solution is to set "spellcheck.maxCollationTries"
to a non-zero value.  Doing so will cause the spellchecker to vet the collation possibilities
against the index, resulting in collations that are guaranteed to generate hits.

{quote}
"spe llcheck" would give suggestions "spa" and "spellcheck" and collate this into "spa spellcheck"
{quote}
This is surprising to me and might indicate a bug.  This patch is designed to carefully ensure
that when building collations, the corrections do not overlap one another.  For instance if
"q=spe llcheck" and it gives corrections of "spe>spa" and "spe llcheck>spellcheck",
it should not collate these to "q=spa spellcheck" because "spe" overlaps with "spe llcheck".
 So if you can describe in detail what you're indexing and querying (maybe paste the resulting
xml), it would be help me figure out what's going on.  Better yet, if you can write a failing
unit test and post a patch...

{quote}
I never got any results back when one of the parts had a typo. So "spe llchek" would not give
any suggestions.
{quote}
This patch does not have the ability to first correct a word fragment and then combine it
with another fragment to make a corrected word.  Possibly this would be a good next step after
what we've got here already gets worked out.

{quote}
it would also be handy if "spell check" would result in the suggestion "spellcheck".  Or is
this already possible?
{quote}
This is the core of what this issue (really LUCENE-3523) is all about, provided that "spellcheck"
is in the dictionary&index you're using.
                
> Integrate WordBreakSpellChecker with Solr
> -----------------------------------------
>
>                 Key: SOLR-2993
>                 URL: https://issues.apache.org/jira/browse/SOLR-2993
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud, spellchecker
>    Affects Versions: 4.0
>            Reporter: James Dyer
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-2993.patch
>
>
> A SpellCheckComponent enhancement, leveraging the WordBreakSpellChecker from LUCENE-3523:
> - Detect spelling errors resulting from misplaced whitespace without the use of shingle-based
dictionaries.  
> - Seamlessly integrate word-break suggestions with single-word spelling corrections from
the existing FileBased-, IndexBased- or Direct- spell checkers.  
> - Provide collation support for word-break errors including cases where the user has
a mix of single-word spelling errors and word-break errors in the same query.  
> - Provide shard support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message