lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Lucene for chinese search
Date Fri, 22 Jun 2007 10:37:19 GMT
Regarding point #2, in case none of those work for you for some reason, you could always try
using this:

$ ll analyzers/src/java/org/apache/lucene/analysis/ngram/
total 48
-rw-rw-r--  1 otis otis 4934 Mar  2 16:32 EdgeNGramTokenFilter.java
-rw-rw-r--  1 otis otis 4617 Feb 21 15:33 EdgeNGramTokenizer.java
-rw-rw-r--  1 otis otis 3257 Mar  2 17:12 NGramTokenFilter.java
-rw-rw-r--  1 otis otis 3103 Mar  2 16:33 NGramTokenizer.java
drwxrwxr-x  7 otis otis 4096 May 31 10:11 .svn/

Otis
--
Lucene Consulting -- http://lucene-consulting.com/


----- Original Message ----
From: Chris Lu <chris.lu@gmail.com>
To: java-user@lucene.apache.org
Sent: Sunday, June 17, 2007 8:09:30 PM
Subject: Re: Lucene for chinese search

There are three things to watch out for chinese or CJK languages:

1. The content source or database need to be encoded in UTF-8.
2. StandardAnalyzer doesn't support chinese words well. Use either
ChineseAnalyzer or CJKAnalyzer. My experience is that CJKAnalyzer is a
little better.
3. The user's query should be encoded in UTF-8.

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes


On 6/17/07, leelb@xedge.com.sg <leelb@xedge.com.sg> wrote:
> Hi,
>
> I would like to know whether Standard Analyzer allows searching of chinese
> words?
>
> And in order to support chinese searching, is there any encoding needed in
> order to develop the application?
>
> I'm currently using Jetty as web server, jsp as application, and search
> results will be saved in xml file and display it using xsl. So is there
> encoding needed for any of the files (xml, xsl, etc...) as well as during
> parsing of query?
>
> thanks alot
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message