lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lance Norskog (JIRA)" <>
Subject [jira] [Updated] (SOLR-3653) Custom bigramming filter for to handle Smart Chinese edge cases
Date Mon, 24 Sep 2012 02:49:07 GMT


Lance Norskog updated SOLR-3653:

    Attachment: translations_450.five2thirteen.txt
> Custom bigramming filter for to handle Smart Chinese edge cases
> ---------------------------------------------------------------
>                 Key: SOLR-3653
>                 URL:
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Lance Norskog
>         Attachments: SmartChineseType.pdf, SOLR-3653.patch, translations_450.five2thirteen.txt,
translations_first_500.quad.txt, translations_first_500.trigrams.txt
> The "Smart" Simplified Chinese toolkit in lucene/analysis/smartcn does not work in some
edge cases. It fails to split certain words which were not part of the dictionary or training
> This patch supplies a bigramming class to handle these occasional mistakes. The algorithm
creates bigrams out of all "words" longer than two ideograms.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message