Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 57895 invoked from network); 5 Oct 2005 03:55:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 5 Oct 2005 03:55:11 -0000 Received: (qmail 35022 invoked by uid 500); 5 Oct 2005 03:55:09 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 35004 invoked by uid 500); 5 Oct 2005 03:55:09 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 34992 invoked by uid 99); 5 Oct 2005 03:55:09 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [192.87.106.226] (HELO ajax.apache.org) (192.87.106.226) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Oct 2005 20:55:08 -0700 Received: from ajax.apache.org (ajax.apache.org [127.0.0.1]) by ajax.apache.org (Postfix) with ESMTP id B397821C for ; Wed, 5 Oct 2005 05:54:47 +0200 (CEST) Message-ID: <2104123322.1128484487733.JavaMail.jira@ajax.apache.org> Date: Wed, 5 Oct 2005 05:54:47 +0200 (CEST) From: "Otis Gospodnetic (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Resolved: (LUCENE-444) StandardTokenizer loses Korean characters In-Reply-To: <1973244387.1128436010719.JavaMail.jira@ajax.apache.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/LUCENE-444?page=all ] Otis Gospodnetic resolved LUCENE-444: ------------------------------------- Fix Version: 1.9 Resolution: Fixed Committed. Thanks Cheolgoo. > StandardTokenizer loses Korean characters > ----------------------------------------- > > Key: LUCENE-444 > URL: http://issues.apache.org/jira/browse/LUCENE-444 > Project: Lucene - Java > Type: Bug > Components: Analysis > Reporter: Cheolgoo Kang > Priority: Minor > Fix For: 1.9 > Attachments: StandardTokenizer_Korean.patch > > While using StandardAnalyzer, exp. StandardTokenizer with Korean text stream, StandardTokenizer ignores the Korean characters. This is because the definition of CJK token in StandardTokenizer.jj JavaCC file doesn't have enough range covering Korean syllables described in Unicode character map. > This patch adds one line of 0xAC00~0xD7AF, the Korean syllables range to the StandardTokenizer.jj code. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org