commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Black <ch...@lotuscat.com>
Subject Re: [Codec] accented character soundex revisited
Date Wed, 15 Feb 2006 23:06:18 GMT
I am running the latest, I even did a new svn co into a new directory to 
check:
mkdir codec2
svn co https://svn.apache.org/repos/asf/jakarta/commons/proper/codec/trunk
cd codec2
(set up build.properties to point to my junit.jar)
ant clean jar test

Gives me a failure on SoundexTest. I am junning Sun Java 1.4.2_08_b03.

I am curious as to why it would fail for me but not you, only 
differences are java version and perhaps junit version (I am using v3.8.1).

Best,
Chris

Gary Gregory wrote:

>Hello Chris:
>
>Welcome to Codec development.
>
>One obvious thing: Make sure you start with latest from SVN:
>
>https://svn.apache.org/repos/asf/jakarta/commons/proper/codec/trunk
>
>(as opposed to the 1.3 sources)
>
>
>You mention unit test failures? When I run:
>
>ant clean jar test
>
>All unit tests pass.
>
>I am using Sun Java 1.4.2_10.
>
>Gary
>
>  
>
>>-----Original Message-----
>>From: Chris Black [mailto:chris@lotuscat.com]
>>Sent: Wednesday, February 15, 2006 1:28 PM
>>To: commons-dev@jakarta.apache.org
>>Subject: [Codec] accented character soundex revisited
>>
>>Over 18 months ago there was a thread on this list about the proper
>>handling of accented characters in the Soundex encoder in
>>    
>>
>commons-codec
>  
>
>>but it never seemed to get resolved. In addition, there are still
>>failing unit tests that reference this issue in the current version of
>>the code. As someone who uses this code, I'd like to see all unit
>>    
>>
>tests
>  
>
>>passing, so I've done some investigation.
>>As a refresher, there were three options discussed for the behavior of
>>the Soundex codec when it sees an accented character:
>>1) Throw an IllegalArgumentException
>>2) Drop it silently
>>3) Replace it with the equivalent unaccented character
>>
>>Right now the code drops it silently, but the unit tests are expecting
>>an IllegalArgumentException. The code in Soundex.map(char ch) seems to
>>be trying to throw this exception, but it will never happen because
>>    
>>
>the
>  
>
>>characters passed to it from Soundex.soundex are from a String that
>>    
>>
>has
>  
>
>>gone through SoundexUtils.clean(String str) which removes all
>>    
>>
>characters
>  
>
>>that fail a Character.isCharacter(char ch) check (accented chars fail
>>this check, I, erm, checked). This means if we want to throw an
>>IllegalArgumentException it must be done in SoundexUtils.clean, not
>>Soundex.map.
>>
>>I think either behaviors 1 or 2 (drop silently, which is what we
>>currently do) would be easy to implement and then change the unit
>>    
>>
>tests
>  
>
>>to match the behavior so all unit tests on commons-codec pass.
>>
>>If someone lets me know which behavior is desired I will submit a
>>    
>>
>patch.
>  
>
>>Note that behavior 2 only requires either removing the test cases or
>>changing them to expect the same encoding as an empty string.
>>
>>References:
>>http://issues.apache.org/bugzilla/show_bug.cgi?id=29080
>>
>>    
>>
>http://www.mail-archive.com/commons-dev@jakarta.apache.org/msg41974.html
>  
>
>>Best,
>>Chris
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>>
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message