Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 297C2DE61 for ; Tue, 4 Sep 2012 05:04:45 +0000 (UTC) Received: (qmail 95397 invoked by uid 500); 4 Sep 2012 05:04:41 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 95280 invoked by uid 500); 4 Sep 2012 05:04:41 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 95262 invoked by uid 99); 4 Sep 2012 05:04:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Sep 2012 05:04:41 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [202.40.204.177] (HELO mailgate.ln.edu.hk) (202.40.204.177) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Sep 2012 05:04:31 +0000 Received: from [202.40.194.176] ([202.40.194.176]) by mailgate.ln.edu.hk (8.14.5/8.14.2) with SMTP id q84548dG030680 for ; Tue, 4 Sep 2012 13:04:08 +0800 Message-ID: <50458B8F.4070500@ln.edu.hk> Date: Tue, 04 Sep 2012 13:03:11 +0800 From: waynelam User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20120824 Thunderbird/15.0 MIME-Version: 1.0 To: solr-user@lucene.apache.org Subject: Searching of Chinese characters and English References: <504588FD.3030704@ln.edu.hk> In-Reply-To: <504588FD.3030704@ln.edu.hk> Content-Type: multipart/alternative; boundary="------------000204060609050105010405" --------------000204060609050105010405 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi all, I tried to modified the schema.xml and solrconfig.xml come with Drupal "search_api_solr" modules. I tried to modified it so that it is suitable for an CJK environment. I can see Chinese words cut up each 2 words in "Field Analysis". If i use the following query my_ip_address:8080/solr/select?indent=on&version=2.2&fq=t_title:"Find"&start=0&rows=10&fl=t_title I can see it returning results. The problem is when i change the search keywords for one of my field (e.g. t_title) to Chinese characters. It always shows in the results. It is strange because if a title contains both chinese and english (e.g. testing ??), when i search just the english part (e.g. fq=t_title:"testing"), i can find the result perfectly. It just happened to be problem when searching chinese characters. Much appreciated if you guys can show me which part i did wrong. Thanks Wayne *My Settings:* Java : 1.6.0_24 Solr : version 3.6.1 tomcat: version 6.0.35 *My schema.xml* (i highlighted the place i changed from default) *** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** * ** id --------------000204060609050105010405--