lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Samphan Raruenrom (JIRA)" <>
Subject [jira] Commented: (LUCENE-503) Contrib: ThaiAnalyzer to enable Thai full-text search in Lucene
Date Wed, 12 Apr 2006 05:16:23 GMT
    [ ] 

Samphan Raruenrom commented on LUCENE-503:

I've changed the code to use java.text.BreakIterator instead of ICU4j to remove the dependency
on ICU4j. The ThaiAnayzer is tested intensively by several groups of developers in at least
two production systems (by To-Be-One Technology, who support the development) so it is quite
stable. The code is rather small cause I try to make it as efficient and easy to read as possible.
It's tested in Lucece 1.4 and lately in Lucene 1.9.1.

> Contrib: ThaiAnalyzer to enable Thai full-text search in Lucene
> ---------------------------------------------------------------
>          Key: LUCENE-503
>          URL:
>      Project: Lucene - Java
>         Type: New Feature

>   Components: Analysis
>     Versions: 1.4
>     Reporter: Samphan Raruenrom
>  Attachments:,
> Thai text don't have space between words. Usually, a dictionary-based algorithm is used
to break string into words. For Lucene to be usable for Thai, an Analyzer that know how to
break Thai words is needed.
> I've implemented such Analyzer, ThaiAnalyzer, using ICU4j DictionaryBasedBreakIterator
for word breaking. I'll upload the code later.
> I'm normally a C++ programmer and very new to Java. Please review the code for any problem.
One possible problem is that it requires ICU4j. I don't know whether this is OK.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message