Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CE77ADC62 for ; Thu, 28 Jun 2012 06:44:16 +0000 (UTC) Received: (qmail 69061 invoked by uid 500); 28 Jun 2012 06:44:14 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 68851 invoked by uid 500); 28 Jun 2012 06:44:13 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 68834 invoked by uid 99); 28 Jun 2012 06:44:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Jun 2012 06:44:12 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of fancyerii@gmail.com designates 209.85.213.47 as permitted sender) Received: from [209.85.213.47] (HELO mail-yw0-f47.google.com) (209.85.213.47) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Jun 2012 06:44:06 +0000 Received: by yhjj56 with SMTP id j56so2278122yhj.6 for ; Wed, 27 Jun 2012 23:43:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=nDiGQDr+NIzM1xjx/MN5SkirzOz+5bCKZ/ivhOiP1o8=; b=K9K5FJElXqOvN2Hv2PlhIzqkbLjfUPIwGhTvkk0lvOgJtWa4qOYmtbtENqeswcgrc4 8lphycplyg88vSe3u/tBY6ukXcXdZThPdUkVrRcv2wfdCl+bTwi9SJRUvPmLYNQngbiD sAMxsi4mI3zFdjkxOeAaU1TCzcFYxuHcwQvUKwx76cP4gyfrtcQTysQ/3cNZiWFxIQoW wni9QIAceNXuh40lqMZ/fFPpHvH5tz3twCCI4gTGjbvIUIzWzmK0d5jMr9shIHi0Zszr XNBgvVs2K7fLSmLF9mJtG7BNcvPYcSJv7yzE7qHwuQmijjmuvyPe9lehD3A37EuJl0Fi Xhiw== MIME-Version: 1.0 Received: by 10.50.190.163 with SMTP id gr3mr178851igc.74.1340865825250; Wed, 27 Jun 2012 23:43:45 -0700 (PDT) Received: by 10.64.132.98 with HTTP; Wed, 27 Jun 2012 23:43:45 -0700 (PDT) In-Reply-To: References: Date: Thu, 28 Jun 2012 14:43:45 +0800 Message-ID: Subject: Re: Question about chinese and WildcardQuery From: Li Li To: java-user@lucene.apache.org Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: quoted-printable in Chinese, there isn't word boundary between words. it writes like: Iamok. you should tokenize it to I am ok if you want to search *amo*, you should view I am ok as one token. In Chinese, fuzzy search is not very useful. even use Standard Analyzer, it's ok to use boolean query. because "Iamok" is tokenized as I a m o k. if search boolean query +a +m +o, it's fine. Chinese has many letters(commonly used more than 3000). and words are very short(most words has only 2 letters). On Thu, Jun 28, 2012 at 2:31 PM, Paco Avila wrote: > Thank, using Whitespace Analyzer works, but I don't understand why > StandardAnalyzer does not work if according with the ChineseAnalyzer > deprecation I should use StandardAnalyzer: > > @deprecated Use {@link StandardAnalyzer} instead, which has the same > functionality. > > Is very annoying. > > 2012/6/27 Li Li > >> standard analyzer will segment each character into a token, you should u= se >> whitespace analyzer or your own analyzer that can tokenize it as one tok= en >> for wildcard search >> =D4=DA 2012-6-27 =B0=F8=CD=ED6:20=A3=AC"Paco Avila" = =D0=B4=B5=C0=A3=BA >> >> > Hi there, >> > >> > I have to index chinese content and I don't get the expected results w= hen >> > searching. It seems that the WildcardQuery does not work properly with >> the >> > chinese characters. See attached sample code. >> > >> > I store the string "=D7=A8=CF=EE=D0=C5=CF=A2=B9=DC=C0=ED.doc" using th= e StandardAnalyzer and after that >> > search for "=D7=A8=CF=EE=D0=C5*" and no result is given. AFAIK, it sho= uld match the >> > "=D7=A8=CF=EE=D0=C5=CF=A2=B9=DC=C0=ED.doc" string but it doesn't :( >> > >> > NOTE: Use Lucene 3.1.0 >> > >> > Regards. >> > -- >> > http://www.openkm.com >> > http://www.guia-ubuntu.org >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> > For additional commands, e-mail: java-user-help@lucene.apache.org >> > >> > > > > -- > OpenKM > http://www.openkm.com > http://www.guia-ubuntu.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org