lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nattapong Sirilappanich (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4253) ThaiAnalyzer fail to tokenize word.
Date Thu, 26 Jul 2012 09:04:33 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422970#comment-13422970
] 

Nattapong Sirilappanich commented on LUCENE-4253:
-------------------------------------------------

Hi Robert,

Based on your suggestion, i found the actual problem.
The problem is "stopwords.txt" in package "org.apache.lucene.analysis.th" contain a lot of
words that is stop words for a specific type of usage. The only type of usage is already stated
inside the file.
And based on the javadoc, since Lucene 3.6, these words are being used by default.

In my opinion, these set of words shall not be used by default.
                
> ThaiAnalyzer fail to tokenize word.
> -----------------------------------
>
>                 Key: LUCENE-4253
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4253
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/analysis
>    Affects Versions: Realtime Branch
>         Environment: Windows 7 SP1.
> Java 1.7.0-b147
>            Reporter: Nattapong Sirilappanich
>
> Method 
> protected TokenStreamComponents createComponents(String,Reader)
> return a component that unable to tokenize Thai word.
> The current return statement is:
> return new TokenStreamComponents(source, new StopFilter(matchVersion,        result,
stopwords));
> My experiment is change the return statement to:
> return new TokenStreamComponents(source, result);
> It give me a correct result.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message