lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheolgoo Kang (JIRA)" <j...@apache.org>
Subject [jira] Created: (LUCENE-444) StandardTokenizer loses Korean characters
Date Tue, 04 Oct 2005 14:26:50 GMT
StandardTokenizer loses Korean characters
-----------------------------------------

         Key: LUCENE-444
         URL: http://issues.apache.org/jira/browse/LUCENE-444
     Project: Lucene - Java
        Type: Bug
  Components: Analysis  
    Reporter: Cheolgoo Kang
    Priority: Minor


While using StandardAnalyzer, exp. StandardTokenizer with Korean text stream, StandardTokenizer
ignores the Korean characters. This is because the definition of CJK token in StandardTokenizer.jj
JavaCC file doesn't have enough range covering Korean syllables described in Unicode character
map.
This patch adds one line of 0xAC00~0xD7AF, the Korean syllables range to the StandardTokenizer.jj
code.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message