Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 91368 invoked from network); 18 Jun 2007 18:02:20 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Jun 2007 18:02:20 -0000 Received: (qmail 44592 invoked by uid 500); 18 Jun 2007 18:02:16 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 44435 invoked by uid 500); 18 Jun 2007 18:02:15 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 44424 invoked by uid 99); 18 Jun 2007 18:02:15 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Jun 2007 11:02:15 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of chris.lu@gmail.com designates 64.233.184.230 as permitted sender) Received: from [64.233.184.230] (HELO wr-out-0506.google.com) (64.233.184.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Jun 2007 11:02:11 -0700 Received: by wr-out-0506.google.com with SMTP id m59so372013wrm for ; Mon, 18 Jun 2007 11:01:50 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=p953f/CrDelJ1jTSEXDAlxYIRSE2dew+xjzKyRnqiticIGVs6uzb1eSWH/nje9u6DxoS1GZNGpzDOkrM8553HnMc7EtybQgHh/hsaLZuAiQyl7HRapy1Gq5V7Fbym5Z9I9XBmA2qVfS9krgXwilO5i3kWPtDbBpnlS49uqdEJds= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=GKgO/rodlPxLUWn8z37iO2ddJZv4vm23s1pbNdloqZN/Q6L2Bwd3ciQj8b5XghBlHhsf+IL49I22S7ZfP2T/do072uM54eMwPrfEsSgdSmlIgZ2pppyjtjGkYBkWrHzsh1dubCC+qMHSyo85RIX4sIKWuxCqF6Dd2UqaULy36pc= Received: by 10.78.138.6 with SMTP id l6mr2453489hud.1182189709817; Mon, 18 Jun 2007 11:01:49 -0700 (PDT) Received: by 10.78.140.12 with HTTP; Mon, 18 Jun 2007 11:01:49 -0700 (PDT) Message-ID: <6e3ae6310706181101h36c7b52gb255fa053398e4ca@mail.gmail.com> Date: Mon, 18 Jun 2007 11:01:49 -0700 From: "Chris Lu" To: java-user@lucene.apache.org Subject: Re: Lucene for chinese search In-Reply-To: <200706181321.l5IDLeNT002721@ns21.webhostsg.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: <46768163.7050704@garambrogne.net> <200706181321.l5IDLeNT002721@ns21.webhostsg.com> X-Virus-Checked: Checked by ClamAV on apache.org Basically where ever you see, the encoding should be utf8. The servlet also has an encoding setting. For your case, change the tomcat setting. When rendering jsp page, the encoding also matters. --=20 Chris Lu ------------------------- Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=3DCreate_Lucene_Database_Search_in_= 3_minutes On 6/18/07, Lee Li Bin wrote: > > Hi, > > For indexing, there is no problem, there is Chinese text similar to my > datasource (XML) in the index file when opening on a note pad. > > When I try to use the utf8 in jsp and, getbytes array of 'utf-8' or > ISO88599_1 or Cp1252 in Java servlet, but we getting search problem, the > search result does not display for Chinese term. > > I mixed English and Chinese text in my datasource, the search is working = for > English term, and Chinese char display as '???' in the result output. > > Please advice or send some sample / solutions > > Thanks. > > -----Original Message----- > From: Mathieu Lecarme [mailto:mathieu@garambrogne.net] > Sent: Monday, June 18, 2007 8:58 PM > To: java-user@lucene.apache.org > Subject: Re: Lucene for chinese search > > Lee Li Bin a =E9crit : > > Hi, > > > > I still met problem for searching of Chinese words. > > XMl file which is the datasource and analyzer has already been encoded. > > Have testing on StandardAnalyzer, CJKAnalyzer, and ChineseAnalyzer, but= it > > still can't get any results. > > > > 1. do we need any encoding configuration in apache tomcat for Chines= e > > search using Lucence > > > > 2. do we need to use JSP meta / page encoding ? what is the encoding > > for jsp? > > > try first with simple junit test, after you can fight with UTF8 parameter= s. > > M. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org