lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3653) Support Smart Simplified Chinese in Solr - include clean-up bigramming filter
Date Fri, 20 Jul 2012 16:41:35 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419330#comment-13419330
] 

Robert Muir commented on SOLR-3653:
-----------------------------------

{quote}
Because parts of it are also words, which should be searchable.
{quote}

Says who? There is no real word boundaries in this language. 

If you want to start indexing individual characters, just use StandardTokenizer.

None of your examples are "failures" of this tokenizer. This is what it has in its dictionary!
                
> Support Smart Simplified Chinese in Solr - include clean-up bigramming filter
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-3653
>                 URL: https://issues.apache.org/jira/browse/SOLR-3653
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Lance Norskog
>         Attachments: SOLR-3653.patch, SmartChineseType.pdf
>
>
> The "Smart" Simplified Chinese toolkit in lucene/analysis/smartcn has no Solr factories.
Also, since it is a statistical algorithm, it is not perfect.
> This patch supplies factories and a schema.xml type for the existing Lucene Smart Chinese
implementation, and includes a "fixup" class to handle the occasional mistake made by the
Smart Chinese implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message