lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From saisantoshi <saisantosh...@gmail.com>
Subject Re: Is StandardAnalyzer good enough for multi languages...
Date Wed, 09 Jan 2013 18:23:41 GMT
Thanks for all the responses. From the above, it sounds that there are two
options.

1. Use ICUTokenizer ( is it in Lucene 4.0 or 4.1)? If its in 4.1, then we
cannot use at this time as it is not released out.

2. Write a custom analyzer by extending ( StandardAnalyzer) and add filters
for additional languages. 

The problem that we are facing currently is described in detail at: 

http://lucene.472066.n3.nabble.com/Lucene-support-for-multi-byte-characters-2-4-0-version-td4031654.html
<http://lucene.472066.n3.nabble.com/Lucene-support-for-multi-byte-characters-2-4-0-version-td4031654.html>
 
Just to summarize it, we are facing some issues tokenizing some Japanese
keyword characters (while uploading some documents, we have some keywords
where people can type in any language) and as a result, searching using such
specific keywords words is not working with the StandardAnalyzer (2.4.0
version).

Can you suggest any filter for this to integrate in Standard Analyzer?

Thanks,
Sai.



--
View this message in context: http://lucene.472066.n3.nabble.com/Is-StandardAnalyzer-good-enough-for-multi-languages-tp4031660p4031942.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message