lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Youngho Cho" <youn...@nannet.co.kr>
Subject Re: [jira] Created: (LUCENE-444) StandardTokenizer loses Korean characters
Date Wed, 05 Oct 2005 02:38:58 GMT
Hello,

Is there any plan to add this patch into lucene core ?
I am using CJKAnalyzer but I hope to switch to the StanadardAnalyzer.

Thanks,

Youngho

----- Original Message ----- 
From: "Cheolgoo Kang (JIRA)" <jira@apache.org>
To: <java-dev@lucene.apache.org>
Sent: Tuesday, October 04, 2005 11:26 PM
Subject: [jira] Created: (LUCENE-444) StandardTokenizer loses Korean characters


> StandardTokenizer loses Korean characters
> -----------------------------------------
> 
>          Key: LUCENE-444
>          URL: http://issues.apache.org/jira/browse/LUCENE-444
>      Project: Lucene - Java
>         Type: Bug
>   Components: Analysis  
>     Reporter: Cheolgoo Kang
>     Priority: Minor
> 
> 
> While using StandardAnalyzer, exp. StandardTokenizer with Korean text stream, StandardTokenizer
ignores the Korean characters. This is because the definition of CJK token in StandardTokenizer.jj
JavaCC file doesn't have enough range covering Korean syllables described in Unicode character
map.
> This patch adds one line of 0xAC00~0xD7AF, the Korean syllables range to the StandardTokenizer.jj
code.
> 
> -- 
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the administrators:
>    http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:
>    http://www.atlassian.com/software/jira
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
Mime
View raw message