lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Lu" <chris...@gmail.com>
Subject Re: Lucene for chinese search
Date Sun, 17 Jun 2007 18:09:30 GMT
There are three things to watch out for chinese or CJK languages:

1. The content source or database need to be encoded in UTF-8.
2. StandardAnalyzer doesn't support chinese words well. Use either
ChineseAnalyzer or CJKAnalyzer. My experience is that CJKAnalyzer is a
little better.
3. The user's query should be encoded in UTF-8.

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes


On 6/17/07, leelb@xedge.com.sg <leelb@xedge.com.sg> wrote:
> Hi,
>
> I would like to know whether Standard Analyzer allows searching of chinese
> words?
>
> And in order to support chinese searching, is there any encoding needed in
> order to develop the application?
>
> I'm currently using Jetty as web server, jsp as application, and search
> results will be saved in xml file and display it using xsl. So is there
> encoding needed for any of the files (xml, xsl, etc...) as well as during
> parsing of query?
>
> thanks alot
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message