lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1466) CharFilter - normalize characters before tokenizer
Date Thu, 11 Jun 2009 03:49:07 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718291#action_12718291
] 

Robert Muir commented on LUCENE-1466:
-------------------------------------

just as an alternative, i have a different mechanism as part of lucene-1488 patch I am working
on. But maybe its good to have options, as it depends on the ICU library.

below is excerpt from javadoc.

A TokenFilter that transforms text with ICU.

ICU provides text-transformation functionality via its Transliteration API.
Although script conversion is its most common use, a transliterator can actually perform a
more general class of tasks. 
...
Some useful transformations for search are built-in:
* Conversion from Traditional to Simplified Chinese characters
* Conversion from Hiragana to Katakana
* Conversion from Fullwidth to Halfwidth forms.
...
Example usage:
 * stream = new ICUTransformFilter(stream, Transliterator.getInstance("Traditional-Simplified"));


> CharFilter - normalize characters before tokenizer
> --------------------------------------------------
>
>                 Key: LUCENE-1466
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1466
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Analysis
>    Affects Versions: 2.4
>            Reporter: Koji Sekiguchi
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1466.patch, LUCENE-1466.patch
>
>
> This proposes to import CharFilter that has been introduced in Solr 1.4.
> Please see for the details:
> - SOLR-822
> - http://www.nabble.com/Proposal-for-introducing-CharFilter-to20327007.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message