Return-Path: Delivered-To: apmail-lucene-java-commits-archive@www.apache.org Received: (qmail 48349 invoked from network); 18 Mar 2009 17:29:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Mar 2009 17:29:15 -0000 Received: (qmail 49573 invoked by uid 500); 18 Mar 2009 17:29:15 -0000 Delivered-To: apmail-lucene-java-commits-archive@lucene.apache.org Received: (qmail 49545 invoked by uid 500); 18 Mar 2009 17:29:14 -0000 Mailing-List: contact java-commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-commits@lucene.apache.org Received: (qmail 49536 invoked by uid 99); 18 Mar 2009 17:29:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Mar 2009 10:29:14 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Mar 2009 17:29:14 +0000 Received: by eris.apache.org (Postfix, from userid 65534) id 199B5238889D; Wed, 18 Mar 2009 17:28:54 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r755666 - /lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/cjk/CJKTokenizer.java Date: Wed, 18 Mar 2009 17:28:53 -0000 To: java-commits@lucene.apache.org From: mikemccand@apache.org X-Mailer: svnmailer-1.0.8 Message-Id: <20090318172854.199B5238889D@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Author: mikemccand Date: Wed Mar 18 17:28:53 2009 New Revision: 755666 URL: http://svn.apache.org/viewvc?rev=755666&view=rev Log: LUCENE-1490: fix latin1 conversion of HALFWIDTH_AND_FULLWIDTH_FORMS characters to only apply to the correct subset Modified: lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/cjk/CJKTokenizer.java Modified: lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/cjk/CJKTokenizer.java URL: http://svn.apache.org/viewvc/lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/cjk/CJKTokenizer.java?rev=755666&r1=755665&r2=755666&view=diff ============================================================================== --- lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/cjk/CJKTokenizer.java (original) +++ lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/cjk/CJKTokenizer.java Wed Mar 18 17:28:53 2009 @@ -148,10 +148,12 @@ || (ub == Character.UnicodeBlock.HALFWIDTH_AND_FULLWIDTH_FORMS) ) { if (ub == Character.UnicodeBlock.HALFWIDTH_AND_FULLWIDTH_FORMS) { - /** convert HALFWIDTH_AND_FULLWIDTH_FORMS to BASIC_LATIN */ - int i = (int) c; + int i = (int) c; + if (i >= 65281 && i <= 65374) { + /** convert certain HALFWIDTH_AND_FULLWIDTH_FORMS to BASIC_LATIN */ i = i - 65248; c = (char) i; + } } // if the current character is a letter or "_" "+" "#"