Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-dev@lucene.apache.org
Message-ID: <1721628115.1131784683995.JavaMail.jira@ajax.apache.org>
Date: Sat, 12 Nov 2005 09:38:03 +0100 (CET)
From: "Erik Hatcher (JIRA)" <jira@apache.org>
To: java-dev@lucene.apache.org
Subject: [jira] Closed: (LUCENE-461) StandardTokenizer splitting all of Korean
 words into separate characters
In-Reply-To: <2089312067.1131433039563.JavaMail.jira@ajax.apache.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

     [ http://issues.apache.org/jira/browse/LUCENE-461?page=all ]
     
Erik Hatcher closed LUCENE-461:
-------------------------------


> StandardTokenizer splitting all of Korean words into separate characters
> ------------------------------------------------------------------------
>
>          Key: LUCENE-461
>          URL: http://issues.apache.org/jira/browse/LUCENE-461
>      Project: Lucene - Java
>         Type: Bug
>   Components: Analysis
>  Environment: Analyzing Korean text with Apache Lucene, esp. with StandardAnalyzer.
>     Reporter: Cheolgoo Kang
>     Priority: Minor
>      Fix For: 1.9
>  Attachments: StandardTokenizer_KoreanWord.patch, TestStandardAnalyzer_KoreanWord.patch
>
> StandardTokenizer splits all those Korean words inth separate character tokens. For example, "?????" is one Korean word that means "Hello", but StandardAnalyzer separates it into five tokens of "?", "?", "?", "?", "?".

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org