lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheolgoo Kang (JIRA)" <j...@apache.org>
Subject [jira] Created: (LUCENE-461) StandardTokenizer splitting all of Korean words into separate characters
Date Tue, 08 Nov 2005 06:57:19 GMT
StandardTokenizer splitting all of Korean words into separate characters
------------------------------------------------------------------------

         Key: LUCENE-461
         URL: http://issues.apache.org/jira/browse/LUCENE-461
     Project: Lucene - Java
        Type: Bug
  Components: Analysis  
 Environment: Analyzing Korean text with Apache Lucene, esp. with StandardAnalyzer.
    Reporter: Cheolgoo Kang
    Priority: Minor


StandardTokenizer splits all those Korean words inth separate character tokens. For example,
"안녕하세요" is one Korean word that means "Hello", but StandardAnalyzer separates it
into five tokens of "안", "녕", "하", "세", "요".

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message