Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 2380 invoked from network); 12 Nov 2005 08:38:32 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 12 Nov 2005 08:38:32 -0000 Received: (qmail 88995 invoked by uid 500); 12 Nov 2005 08:38:31 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 88456 invoked by uid 500); 12 Nov 2005 08:38:29 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 88438 invoked by uid 99); 12 Nov 2005 08:38:28 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [192.87.106.226] (HELO ajax.apache.org) (192.87.106.226) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 Nov 2005 00:38:24 -0800 Received: from ajax.apache.org (ajax.apache.org [127.0.0.1]) by ajax.apache.org (Postfix) with ESMTP id F3540DE for ; Sat, 12 Nov 2005 09:38:03 +0100 (CET) Message-ID: <1721628115.1131784683995.JavaMail.jira@ajax.apache.org> Date: Sat, 12 Nov 2005 09:38:03 +0100 (CET) From: "Erik Hatcher (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Closed: (LUCENE-461) StandardTokenizer splitting all of Korean words into separate characters In-Reply-To: <2089312067.1131433039563.JavaMail.jira@ajax.apache.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/LUCENE-461?page=all ] Erik Hatcher closed LUCENE-461: ------------------------------- > StandardTokenizer splitting all of Korean words into separate characters > ------------------------------------------------------------------------ > > Key: LUCENE-461 > URL: http://issues.apache.org/jira/browse/LUCENE-461 > Project: Lucene - Java > Type: Bug > Components: Analysis > Environment: Analyzing Korean text with Apache Lucene, esp. with StandardAnalyzer. > Reporter: Cheolgoo Kang > Priority: Minor > Fix For: 1.9 > Attachments: StandardTokenizer_KoreanWord.patch, TestStandardAnalyzer_KoreanWord.patch > > StandardTokenizer splits all those Korean words inth separate character tokens. For example, "?????" is one Korean word that means "Hello", but StandardAnalyzer separates it into five tokens of "?", "?", "?", "?", "?". -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org