lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Øie <>
Subject Re: Urgent, please help Index/Search in UTF-8 ???
Date Mon, 11 Apr 2005 10:09:30 GMT
If you use a servlet and a HTML Form to feed queries to the QueryParser 
take good care of all configurations around the servlet container. If 
you, like me, use tomcat you might have to recode the query into 
internal java form (utf-8) before you pass it to lucene.

read this:

then in your receiving servlet:

String query_string = request.getParameter("query");

String query_string = new 

then pass query_string to lucene. This ensures that the string fetched 
by getParameter() is encoded by the right encoding.

Hope this helps!

Mvh Karl Øie

On 11. apr. 2005, at 11.54, Eric Chow wrote:

> Hello,
> I am a beginner in using Lucene.
> My files are contains different language (English, Chinese,
> Portuguese, Japanese and some Asian languages, non-latin languages).
> They always contain in one file.
> Therefore, I have to use UTF-8 to save the contents.
> I am now developing a web-based search engine. I use Lucene to create
> index for those files and search it in web. The charset of the web
> page is UTF-8, but it cannot search anything.
> I try to use some Analyser (CJKAnalyser, ChineseAnalyser,
> StandardAnalyser, SimpleAnalyser), still failed.
> Finally, I tested to use original charset, for example, the Chinese
> contents I used BIG5, and I can search it very well. For those
> English, of couse, no problem.
> But I can't use UTF-8 as the charset for documents. Any suggest and 
> examples ?
> Best regards,
> Eric
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:
- ...I wonder if the really nerdy Klingons learn how to speak english?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message