lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zilverline info <i...@zilverline.org>
Subject Re: Urgent, please help Index/Search in UTF-8 ???
Date Mon, 11 Apr 2005 10:10:55 GMT
For instance look at http://www.zilverline.org/zilverlineweb/space/faq

Michael

Karl Øie wrote:

> If you use a servlet and a HTML Form to feed queries to the 
> QueryParser take good care of all configurations around the servlet 
> container. If you, like me, use tomcat you might have to recode the 
> query into internal java form (utf-8) before you pass it to lucene.
>
>
> read this:
>
> http://www.crazysquirrel.com/compgen/form-encoding.php
>
>
> then in your receiving servlet:
>
> String query_string = request.getParameter("query");
>
> String query_string = new 
> String(query_string.getBytes(),request.getCharacterEncoding());
>
> then pass query_string to lucene. This ensures that the string fetched 
> by getParameter() is encoded by the right encoding.
>
> Hope this helps!
>
> Mvh Karl Øie
>
> On 11. apr. 2005, at 11.54, Eric Chow wrote:
>
>> Hello,
>>
>>
>> I am a beginner in using Lucene.
>>
>>
>> My files are contains different language (English, Chinese,
>> Portuguese, Japanese and some Asian languages, non-latin languages).
>> They always contain in one file.
>> Therefore, I have to use UTF-8 to save the contents.
>>
>> I am now developing a web-based search engine. I use Lucene to create
>> index for those files and search it in web. The charset of the web
>> page is UTF-8, but it cannot search anything.
>>
>> I try to use some Analyser (CJKAnalyser, ChineseAnalyser,
>> StandardAnalyser, SimpleAnalyser), still failed.
>>
>> Finally, I tested to use original charset, for example, the Chinese
>> contents I used BIG5, and I can search it very well. For those
>> English, of couse, no problem.
>>
>> But I can't use UTF-8 as the charset for documents. Any suggest and 
>> examples ?
>>
>>
>> Best regards,
>> Eric
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> - ...I wonder if the really nerdy Klingons learn how to speak english?
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message