lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom Burton-West (JIRA)" <j...@apache.org>
Subject [jira] Created: (LUCENE-2906) Filter to process output of ICUTokenizer and create overlapping bigrams for CJK
Date Fri, 04 Feb 2011 20:31:30 GMT
Filter to process output of ICUTokenizer and create overlapping bigrams for CJK 
--------------------------------------------------------------------------------

                 Key: LUCENE-2906
                 URL: https://issues.apache.org/jira/browse/LUCENE-2906
             Project: Lucene - Java
          Issue Type: New Feature
          Components: Analysis
            Reporter: Tom Burton-West
            Priority: Minor


The ICUTokenizer produces unigrams for CJK. We would like to use the ICUTokenizer but have
overlapping bigrams created for CJK as in the CJK Analyzer.  This filter would take the output
of the ICUtokenizer, read the ScriptAttribute and for selected scripts (Han, Kana), would
produce overlapping bigrams.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message