lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Lu" <>
Subject Re: Lucene for chinese search
Date Sun, 17 Jun 2007 18:09:30 GMT
There are three things to watch out for chinese or CJK languages:

1. The content source or database need to be encoded in UTF-8.
2. StandardAnalyzer doesn't support chinese words well. Use either
ChineseAnalyzer or CJKAnalyzer. My experience is that CJKAnalyzer is a
little better.
3. The user's query should be encoded in UTF-8.

Chris Lu
Instant Scalable Full-Text Search On Any Database/Application
Lucene Database Search in 3 minutes:

On 6/17/07, <> wrote:
> Hi,
> I would like to know whether Standard Analyzer allows searching of chinese
> words?
> And in order to support chinese searching, is there any encoding needed in
> order to develop the application?
> I'm currently using Jetty as web server, jsp as application, and search
> results will be saved in xml file and display it using xsl. So is there
> encoding needed for any of the files (xml, xsl, etc...) as well as during
> parsing of query?
> thanks alot
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message