lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Dyer (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-2993) Integrate WordBreakSpellChecker with Solr
Date Thu, 29 Dec 2011 17:59:30 GMT

     [ https://issues.apache.org/jira/browse/SOLR-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

James Dyer updated SOLR-2993:
-----------------------------

    Attachment: SOLR-2993.patch

Patch adds features described in this issue.  Users can create a Dictionary configuration
in solrconfig.xml like this:

{code:xml}
<lst name="spellchecker">
 <str name="name">wordbreak</str>
 <str name="classname">solr.WordBreakSolrSpellChecker</str>      
 <str name="field">lowerfilt</str>
 <str name="combineWords">true</str>
 <str name="breakWords">true</str>
 <int name="maxChanges">10</int>
</lst>
{code}

Users can also specify multiple "spellcheck.dictionary" parameters.  All specified dictionaries
are consulted and results are interleaved. (this is handled by the new ConjunctionSolrSpellChecker)
Collations are created with combinations from the different spellcheckers, with care taken
that mutliple overlapping corrections do not occur in the same collation.

{code:xml}
<requestHandler name="spellCheckWithWordbreak" class="org.apache.solr.handler.component.SearchHandler">
 <lst name="defaults">
  <str name="spellcheck.dictionary">default</str>
  <str name="spellcheck.dictionary">wordbreak</str>
  <str name="spellcheck.count">20</str>
 </lst>
 <arr name="last-components">
  <str>spellcheck</str>
 </arr>
</requestHandler>
{code}

A future enhancement (outside the scope of this issue) would be to extend ConjunctionSolrSpellChecker
to allow arbitrary dictionary combinations.  For instance, if a user wanted to query two fields
and have two separate dictionaries consulted for each field, etc.  With this patch, however,
ConjunctionSolrSpellChecker is intended to be used to add Word-Break suggestions in with Single-Word
suggestions.
                
> Integrate WordBreakSpellChecker with Solr
> -----------------------------------------
>
>                 Key: SOLR-2993
>                 URL: https://issues.apache.org/jira/browse/SOLR-2993
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud, spellchecker
>    Affects Versions: 4.0
>            Reporter: James Dyer
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-2993.patch
>
>
> A SpellCheckComponent enhancement, leveraging the WordBreakSpellChecker from LUCENE-3523:
> - Detect spelling errors resulting from misplaced whitespace without the use of shingle-based
dictionaries.  
> - Seamlessly integrate word-break suggestions with single-word spelling corrections from
the existing FileBased-, IndexBased- or Direct- spell checkers.  
> - Provide collation support for word-break errors including cases where the user has
a mix of single-word spelling errors and word-break errors in the same query.  
> - Provide shard support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message