lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lee Li Bin" <>
Subject RE: Lucene for chinese search
Date Mon, 18 Jun 2007 13:13:03 GMT


For indexing, there is no problem, there is Chinese text similar to my
datasource (XML) in the index file when opening on a note pad.

When I try to use the utf8 in jsp and, getbytes array of 'utf-8' or
ISO88599_1 or Cp1252 in Java servlet, but we getting search problem, the
search result does not display for Chinese term.

I mixed English and Chinese text in my datasource, the search is working for
English term, and Chinese char display as '???' in the result output.

Please advice or send some sample / solutions 

-----Original Message-----
From: Mathieu Lecarme [] 
Sent: Monday, June 18, 2007 8:58 PM
Subject: Re: Lucene for chinese search

Lee Li Bin a écrit :
> Hi,
> I still met problem for searching of Chinese words.
> XMl file which is the datasource and analyzer has already been encoded.
> Have testing on StandardAnalyzer, CJKAnalyzer, and ChineseAnalyzer, but it
> still can't get any results.
> 1.	do we need any encoding configuration in apache tomcat for Chinese
> search using Lucence 
> 2.	do we need to use JSP meta / page encoding ? what is the encoding
> for 	jsp?
try first with simple junit test, after you can fight with UTF8 parameters.


To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message