lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Samphan Raruenrom (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-503) Contrib: ThaiAnalyzer to enable Thai full-text search in Lucene
Date Tue, 30 May 2006 04:52:31 GMT
    [ http://issues.apache.org/jira/browse/LUCENE-503?page=comments#action_12413756 ] 

Samphan Raruenrom commented on LUCENE-503:
------------------------------------------

All the code have been tested with Lucene 2.0.0.
Thanks Art for the info/URL. I've never known about Pichai's work before I started this project.
However I heard about NECTEC's SansarnLook when I visit them and talk about my ThaiAnalyzer.
My goal for this job is for the code to be included in Lucene for Thai to work out-of-the-box.
So no more wheel reinventing.

> Contrib: ThaiAnalyzer to enable Thai full-text search in Lucene
> ---------------------------------------------------------------
>
>          Key: LUCENE-503
>          URL: http://issues.apache.org/jira/browse/LUCENE-503
>      Project: Lucene - Java
>         Type: New Feature

>   Components: Analysis
>     Versions: 1.4
>     Reporter: Samphan Raruenrom
>  Attachments: TestThaiAnalyzer.java, ThaiAnalyzer.java, ThaiWordFilter.java
>
> Thai text don't have space between words. Usually, a dictionary-based algorithm is used
to break string into words. For Lucene to be usable for Thai, an Analyzer that know how to
break Thai words is needed.
> I've implemented such Analyzer, ThaiAnalyzer, using ICU4j DictionaryBasedBreakIterator
for word breaking. I'll upload the code later.
> I'm normally a C++ programmer and very new to Java. Please review the code for any problem.
One possible problem is that it requires ICU4j. I don't know whether this is OK.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message