lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Samphan Raruenrom (JIRA)" <j...@apache.org>
Subject [jira] Created: (LUCENE-503) Contrib: ThaiAnalyzer to enable Thai full-text search in Lucene
Date Wed, 01 Mar 2006 10:45:39 GMT
Contrib: ThaiAnalyzer to enable Thai full-text search in Lucene
---------------------------------------------------------------

         Key: LUCENE-503
         URL: http://issues.apache.org/jira/browse/LUCENE-503
     Project: Lucene - Java
        Type: New Feature
  Components: Analysis  
    Versions: 1.4    
    Reporter: Samphan Raruenrom


Thai text don't have space between words. Usually, a dictionary-based algorithm is used to
break string into words. For Lucene to be usable for Thai, an Analyzer that know how to break
Thai words is needed.

I've implemented such Analyzer, ThaiAnalyzer, using ICU4j DictionaryBasedBreakIterator for
word breaking. I'll upload the code later.

I'm normally a C++ programmer and very new to Java. Please review the code for any problem.
One possible problem is that it requires ICU4j. I don't know whether this is OK.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message