Return-Path: Delivered-To: apmail-jakarta-commons-dev-archive@www.apache.org Received: (qmail 3504 invoked from network); 15 Feb 2006 23:06:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 15 Feb 2006 23:06:54 -0000 Received: (qmail 20508 invoked by uid 500); 15 Feb 2006 23:06:51 -0000 Delivered-To: apmail-jakarta-commons-dev-archive@jakarta.apache.org Received: (qmail 20461 invoked by uid 500); 15 Feb 2006 23:06:51 -0000 Mailing-List: contact commons-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Help: List-Post: List-Id: "Jakarta Commons Developers List" Reply-To: "Jakarta Commons Developers List" Delivered-To: mailing list commons-dev@jakarta.apache.org Received: (qmail 20450 invoked by uid 99); 15 Feb 2006 23:06:51 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Feb 2006 15:06:51 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [66.170.4.35] (HELO cheesegrater.eragen.com) (66.170.4.35) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Feb 2006 15:06:48 -0800 Received: from localhost (unknown [127.0.0.1]) by cheesegrater.eragen.com (Postfix) with ESMTP id 32FB124DA3 for ; Wed, 15 Feb 2006 23:05:07 +0000 (UTC) Received: from cheesegrater.eragen.com ([127.0.0.1]) by localhost (cheesegrater [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 04171-02 for ; Wed, 15 Feb 2006 17:05:03 -0600 (CST) Received: from [192.168.1.108] (unknown [192.168.1.108]) by cheesegrater.eragen.com (Postfix) with ESMTP for ; Wed, 15 Feb 2006 17:05:03 -0600 (CST) Message-ID: <43F3B3EA.1060501@lotuscat.com> Date: Wed, 15 Feb 2006 17:06:18 -0600 From: Chris Black User-Agent: Mozilla Thunderbird 0.8 (X11/20040913) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Jakarta Commons Developers List Subject: Re: [Codec] accented character soundex revisited References: <19B78354A4AA3E4287384F3D30933F8892D81C@MAIL1.seagull.nl> In-Reply-To: <19B78354A4AA3E4287384F3D30933F8892D81C@MAIL1.seagull.nl> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: amavisd-new at eragen.com X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N I am running the latest, I even did a new svn co into a new directory to check: mkdir codec2 svn co https://svn.apache.org/repos/asf/jakarta/commons/proper/codec/trunk cd codec2 (set up build.properties to point to my junit.jar) ant clean jar test Gives me a failure on SoundexTest. I am junning Sun Java 1.4.2_08_b03. I am curious as to why it would fail for me but not you, only differences are java version and perhaps junit version (I am using v3.8.1). Best, Chris Gary Gregory wrote: >Hello Chris: > >Welcome to Codec development. > >One obvious thing: Make sure you start with latest from SVN: > >https://svn.apache.org/repos/asf/jakarta/commons/proper/codec/trunk > >(as opposed to the 1.3 sources) > > >You mention unit test failures? When I run: > >ant clean jar test > >All unit tests pass. > >I am using Sun Java 1.4.2_10. > >Gary > > > >>-----Original Message----- >>From: Chris Black [mailto:chris@lotuscat.com] >>Sent: Wednesday, February 15, 2006 1:28 PM >>To: commons-dev@jakarta.apache.org >>Subject: [Codec] accented character soundex revisited >> >>Over 18 months ago there was a thread on this list about the proper >>handling of accented characters in the Soundex encoder in >> >> >commons-codec > > >>but it never seemed to get resolved. In addition, there are still >>failing unit tests that reference this issue in the current version of >>the code. As someone who uses this code, I'd like to see all unit >> >> >tests > > >>passing, so I've done some investigation. >>As a refresher, there were three options discussed for the behavior of >>the Soundex codec when it sees an accented character: >>1) Throw an IllegalArgumentException >>2) Drop it silently >>3) Replace it with the equivalent unaccented character >> >>Right now the code drops it silently, but the unit tests are expecting >>an IllegalArgumentException. The code in Soundex.map(char ch) seems to >>be trying to throw this exception, but it will never happen because >> >> >the > > >>characters passed to it from Soundex.soundex are from a String that >> >> >has > > >>gone through SoundexUtils.clean(String str) which removes all >> >> >characters > > >>that fail a Character.isCharacter(char ch) check (accented chars fail >>this check, I, erm, checked). This means if we want to throw an >>IllegalArgumentException it must be done in SoundexUtils.clean, not >>Soundex.map. >> >>I think either behaviors 1 or 2 (drop silently, which is what we >>currently do) would be easy to implement and then change the unit >> >> >tests > > >>to match the behavior so all unit tests on commons-codec pass. >> >>If someone lets me know which behavior is desired I will submit a >> >> >patch. > > >>Note that behavior 2 only requires either removing the test cases or >>changing them to expect the same encoding as an empty string. >> >>References: >>http://issues.apache.org/bugzilla/show_bug.cgi?id=29080 >> >> >> >http://www.mail-archive.com/commons-dev@jakarta.apache.org/msg41974.html > > >>Best, >>Chris >> >> >>--------------------------------------------------------------------- >>To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org >>For additional commands, e-mail: commons-dev-help@jakarta.apache.org >> >> >> > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org >For additional commands, e-mail: commons-dev-help@jakarta.apache.org > > > --------------------------------------------------------------------- To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: commons-dev-help@jakarta.apache.org