commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Black <ch...@lotuscat.com>
Subject Re: [Codec] accented character soundex revisited
Date Wed, 15 Feb 2006 23:25:56 GMT
I compared your output to mine, and one thing I noticed was that my 
SoundexTest line is different, only 19 tests run and 1 failure (probably 
just because it quits after the first failure). I am running on SuSE 
Linux 9.2 with ant 1.6.2. I worry that it is something stupid but I've 
checked all I can think of ($CLASSPATH, versions, etc). I did notice you 
have a full commons path where I only have codec. My Soundex.java has:
@version $Id: Soundex.java 366897 2006-01-07 19:57:36Z tobrien $
SoundexUtils is 161350 2005-04-14 20:39:46Z ggregory with:
    static String clean(String str) {
        if (str == null || str.length() == 0) {
            return str;
        }
        int len = str.length();
        char[] chars = new char[len];
        int count = 0;
        for (int i = 0; i < len; i++) {
            if (Character.isLetter(str.charAt(i))) {
                chars[count++] = str.charAt(i);
            }
        }
        if (count == len) {
            return str.toUpperCase();
        }
        return new String(chars, 0, count).toUpperCase();
    }
and the SoundexTest has:
    /**
     * Fancy characters are not mapped by the default US mapping.
     *
     * http://issues.apache.org/bugzilla/show_bug.cgi?id=29080
     */
    public void testUsMappingOWithDiaeresis() {
        assertEquals("O000", this.getEncoder().encode("o"));
        try {
            assertEquals("326000", this.getEncoder().encode("366"));
            fail("Expected IllegalArgumentException not thrown");
        } catch (IllegalArgumentException e) {
            // expected
        }
    }

Here is my output:

cblack@getafix:~/projects/codec2/trunk> ant clean jar test
Buildfile: build.xml

clean:

init:
     [echo] -------- commons-codec 1.4-dev --------

prepare:
    [mkdir] Created dir: /export/people/cblack/projects/codec2/trunk/target
    [mkdir] Created dir: 
/export/people/cblack/projects/codec2/trunk/target/classes
    [mkdir] Created dir: 
/export/people/cblack/projects/codec2/trunk/target/conf
    [mkdir] Created dir: 
/export/people/cblack/projects/codec2/trunk/target/tests
    [mkdir] Created dir: 
/export/people/cblack/projects/codec2/trunk/target/test-reports

static:
     [copy] Copying 1 file to 
/export/people/cblack/projects/codec2/trunk/target/conf

compile:
    [javac] Compiling 24 source files to 
/export/people/cblack/projects/codec2/trunk/target/classes
     [copy] Copying 6 files to 
/export/people/cblack/projects/codec2/trunk/target/classes

jar:
    [mkdir] Created dir: /export/people/cblack/projects/codec2/trunk/dist
    [mkdir] Created dir: 
/export/people/cblack/projects/codec2/trunk/target/classes/META-INF
     [copy] Copying 1 file to 
/export/people/cblack/projects/codec2/trunk/target/classes/META-INF
      [jar] Building jar: 
/export/people/cblack/projects/codec2/trunk/dist/commons-codec-1.4-dev.jar

init:
     [echo] -------- commons-codec 1.4-dev --------

prepare:

static:

compile:

compile.tests:
    [javac] Compiling 17 source files to 
/export/people/cblack/projects/codec2/trunk/target/tests
    [javac] 
/export/people/cblack/projects/codec2/trunk/src/test/org/apache/commons/codec/language/SoundexTest.java:299:

warning: getMaxLength() in org.apache.commons.codec.language.Soundex has 
been deprecated
    [javac]         soundex.setMaxLength(soundex.getMaxLength());
    [javac]                                     ^
    [javac] 
/export/people/cblack/projects/codec2/trunk/src/test/org/apache/commons/codec/language/SoundexTest.java:299:

warning: setMaxLength(int) in org.apache.commons.codec.language.Soundex 
has been deprecated
    [javac]         soundex.setMaxLength(soundex.getMaxLength());
    [javac]                ^
    [javac] 
/export/people/cblack/projects/codec2/trunk/src/test/org/apache/commons/codec/language/SoundexTest.java:305:

warning: setMaxLength(int) in org.apache.commons.codec.language.Soundex 
has been deprecated
    [javac]         soundex.setMaxLength(2);
    [javac]                ^
    [javac] 
/export/people/cblack/projects/codec2/trunk/src/test/org/apache/commons/codec/net/URLCodecTest.java:48:

warning: getEncoding() in org.apache.commons.codec.net.URLCodec has been 
deprecated
    [javac]         assertEquals(urlCodec.getEncoding(), 
urlCodec.getDefaultCharset());       
    [javac]                              ^
    [javac] 4 warnings

test:
    [junit] Running org.apache.commons.codec.StringEncoderComparatorTest
    [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.037 sec
    [junit] Running org.apache.commons.codec.binary.Base64Test
    [junit] Tests run: 24, Failures: 0, Errors: 0, Time elapsed: 0.079 sec
    [junit] Running org.apache.commons.codec.binary.BinaryCodecTest
    [junit] Tests run: 13, Failures: 0, Errors: 0, Time elapsed: 0.037 sec
    [junit] Running org.apache.commons.codec.binary.HexTest
    [junit] Tests run: 12, Failures: 0, Errors: 0, Time elapsed: 0.086 sec
    [junit] Running org.apache.commons.codec.digest.DigestUtilsTest
    [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.279 sec
    [junit] Running org.apache.commons.codec.language.DoubleMetaphoneTest
    [junit] Tests run: 13, Failures: 0, Errors: 0, Time elapsed: 0.145 sec
    [junit] Running org.apache.commons.codec.language.MetaphoneTest
    [junit] Tests run: 32, Failures: 0, Errors: 0, Time elapsed: 0.132 sec
    [junit] Running org.apache.commons.codec.language.RefinedSoundexTest
    [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.015 sec
    [junit] Running org.apache.commons.codec.language.SoundexTest
    [junit] Tests run: 19, Failures: 1, Errors: 0, Time elapsed: 0.044 sec

BUILD FAILED
/export/people/cblack/projects/codec2/trunk/build.xml:180: Test 
org.apache.commons.codec.language.SoundexTest failed

Total time: 15 seconds
----

I am very curious to figure out what the difference is, if I can't 
repeat the same results I imagine getting my other patch in for crypto 
integer coding (#38657) will be more difficult :)

Best,
Chris



Gary Gregory wrote:

>This is odd indeed.
>
>I use Ant 1.6.5 and JUnit 3.8.1. 
>
>Here is my Ant build output:
>
>Microsoft Windows XP [Version 5.1.2600]
>(C) Copyright 1985-2001 Microsoft Corp.
>
>C:\svn-store\jakarta\commons\codec>ant jar test
>Buildfile: build.xml
>
>init:
>     [echo] -------- commons-codec 1.4-dev --------
>
>prepare:
>
>static:
>
>compile:
>    [javac] Compiling 1 source file to
>C:\svn-store\jakarta\commons\codec\target\classes
>Terminate batch job (Y/N)? y
>
>C:\svn-store\jakarta\commons\codec>ant clean jar test
>Buildfile: build.xml
>
>clean:
>   [delete] Deleting directory C:\svn-store\jakarta\commons\codec\target
>   [delete] Deleting directory C:\svn-store\jakarta\commons\codec\dist
>
>init:
>     [echo] -------- commons-codec 1.4-dev --------
>
>prepare:
>    [mkdir] Created dir: C:\svn-store\jakarta\commons\codec\target
>    [mkdir] Created dir:
>C:\svn-store\jakarta\commons\codec\target\classes
>    [mkdir] Created dir: C:\svn-store\jakarta\commons\codec\target\conf
>    [mkdir] Created dir: C:\svn-store\jakarta\commons\codec\target\tests
>    [mkdir] Created dir:
>C:\svn-store\jakarta\commons\codec\target\test-reports
>
>static:
>     [copy] Copying 1 file to
>C:\svn-store\jakarta\commons\codec\target\conf
>
>compile:
>    [javac] Compiling 24 source files to
>C:\svn-store\jakarta\commons\codec\target\classes
>     [copy] Copying 6 files to
>C:\svn-store\jakarta\commons\codec\target\classes
>
>jar:
>    [mkdir] Created dir: C:\svn-store\jakarta\commons\codec\dist
>    [mkdir] Created dir:
>C:\svn-store\jakarta\commons\codec\target\classes\META-INF
>     [copy] Copying 1 file to
>C:\svn-store\jakarta\commons\codec\target\classes\META-INF
>      [jar] Building jar:
>C:\svn-store\jakarta\commons\codec\dist\commons-codec-1.4-dev.jar
>
>init:
>     [echo] -------- commons-codec 1.4-dev --------
>
>prepare:
>
>static:
>
>compile:
>
>compile.tests:
>    [javac] Compiling 17 source files to
>C:\svn-store\jakarta\commons\codec\target\tests
>    [javac]
>C:\svn-store\jakarta\commons\codec\src\test\org\apache\commons\codec\lan
>guage\SoundexTest.java:299: warning: getMaxLength() in
>org.apache.commons.codec.language.Soundex has been depr
>ecated
>    [javac]         soundex.setMaxLength(soundex.getMaxLength());
>    [javac]                                     ^
>    [javac]
>C:\svn-store\jakarta\commons\codec\src\test\org\apache\commons\codec\lan
>guage\SoundexTest.java:299: warning: setMaxLength(int) in
>org.apache.commons.codec.language.Soundex has been d
>eprecated
>    [javac]         soundex.setMaxLength(soundex.getMaxLength());
>    [javac]                ^
>    [javac]
>C:\svn-store\jakarta\commons\codec\src\test\org\apache\commons\codec\lan
>guage\SoundexTest.java:305: warning: setMaxLength(int) in
>org.apache.commons.codec.language.Soundex has been d
>eprecated
>    [javac]         soundex.setMaxLength(2);
>    [javac]                ^
>    [javac]
>C:\svn-store\jakarta\commons\codec\src\test\org\apache\commons\codec\net
>\URLCodecTest.java:48: warning: getEncoding() in
>org.apache.commons.codec.net.URLCodec has been deprecated
>    [javac]         assertEquals(urlCodec.getEncoding(),
>urlCodec.getDefaultCharset());
>    [javac]                              ^
>    [javac] 4 warnings
>
>test:
>    [junit] Running org.apache.commons.codec.StringEncoderComparatorTest
>    [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.03 sec
>    [junit] Running org.apache.commons.codec.binary.Base64Test
>    [junit] Tests run: 24, Failures: 0, Errors: 0, Time elapsed: 0.07
>sec
>    [junit] Running org.apache.commons.codec.binary.BinaryCodecTest
>    [junit] Tests run: 13, Failures: 0, Errors: 0, Time elapsed: 0.11
>sec
>    [junit] Running org.apache.commons.codec.binary.HexTest
>    [junit] Tests run: 12, Failures: 0, Errors: 0, Time elapsed: 0.4 sec
>    [junit] Running org.apache.commons.codec.digest.DigestUtilsTest
>    [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 1.784
>sec
>    [junit] Running
>org.apache.commons.codec.language.DoubleMetaphoneTest
>    [junit] Tests run: 13, Failures: 0, Errors: 0, Time elapsed: 0.36
>sec
>    [junit] Running org.apache.commons.codec.language.MetaphoneTest
>    [junit] Tests run: 32, Failures: 0, Errors: 0, Time elapsed: 0.32
>sec
>    [junit] Running org.apache.commons.codec.language.RefinedSoundexTest
>    [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.02 sec
>    [junit] Running org.apache.commons.codec.language.SoundexTest
>    [junit] Tests run: 25, Failures: 0, Errors: 0, Time elapsed: 0.02
>sec
>    [junit] Running org.apache.commons.codec.net.BCodecTest
>    [junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 0.311
>sec
>    [junit] Running org.apache.commons.codec.net.QCodecTest
>    [junit] Tests run: 12, Failures: 0, Errors: 0, Time elapsed: 0.171
>sec
>    [junit] Running
>org.apache.commons.codec.net.QuotedPrintableCodecTest
>    [junit] Tests run: 15, Failures: 0, Errors: 0, Time elapsed: 0.21
>sec
>    [junit] Running org.apache.commons.codec.net.RFC1522CodecTest
>    [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.02 sec
>    [junit] Running org.apache.commons.codec.net.URLCodecTest
>    [junit] Tests run: 16, Failures: 0, Errors: 0, Time elapsed: 0.141
>sec
>
>BUILD SUCCESSFUL
>Total time: 1 minute 55 seconds
>
>Gary
>
>  
>
>>-----Original Message-----
>>From: Chris Black [mailto:chris@lotuscat.com]
>>Sent: Wednesday, February 15, 2006 3:06 PM
>>To: Jakarta Commons Developers List
>>Subject: Re: [Codec] accented character soundex revisited
>>
>>I am running the latest, I even did a new svn co into a new directory
>>    
>>
>to
>  
>
>>check:
>>mkdir codec2
>>svn co
>>    
>>
>https://svn.apache.org/repos/asf/jakarta/commons/proper/codec/trunk
>  
>
>>cd codec2
>>(set up build.properties to point to my junit.jar)
>>ant clean jar test
>>
>>Gives me a failure on SoundexTest. I am junning Sun Java 1.4.2_08_b03.
>>
>>I am curious as to why it would fail for me but not you, only
>>differences are java version and perhaps junit version (I am using
>>v3.8.1).
>>
>>Best,
>>Chris
>>
>>Gary Gregory wrote:
>>
>>    
>>
>>>Hello Chris:
>>>
>>>Welcome to Codec development.
>>>
>>>One obvious thing: Make sure you start with latest from SVN:
>>>
>>>https://svn.apache.org/repos/asf/jakarta/commons/proper/codec/trunk
>>>
>>>(as opposed to the 1.3 sources)
>>>
>>>
>>>You mention unit test failures? When I run:
>>>
>>>ant clean jar test
>>>
>>>All unit tests pass.
>>>
>>>I am using Sun Java 1.4.2_10.
>>>
>>>Gary
>>>
>>>
>>>
>>>      
>>>
>>>>-----Original Message-----
>>>>From: Chris Black [mailto:chris@lotuscat.com]
>>>>Sent: Wednesday, February 15, 2006 1:28 PM
>>>>To: commons-dev@jakarta.apache.org
>>>>Subject: [Codec] accented character soundex revisited
>>>>
>>>>Over 18 months ago there was a thread on this list about the proper
>>>>handling of accented characters in the Soundex encoder in
>>>>
>>>>
>>>>        
>>>>
>>>commons-codec
>>>
>>>
>>>      
>>>
>>>>but it never seemed to get resolved. In addition, there are still
>>>>failing unit tests that reference this issue in the current version
>>>>        
>>>>
>of
>  
>
>>>>the code. As someone who uses this code, I'd like to see all unit
>>>>
>>>>
>>>>        
>>>>
>>>tests
>>>
>>>
>>>      
>>>
>>>>passing, so I've done some investigation.
>>>>As a refresher, there were three options discussed for the behavior
>>>>        
>>>>
>of
>  
>
>>>>the Soundex codec when it sees an accented character:
>>>>1) Throw an IllegalArgumentException
>>>>2) Drop it silently
>>>>3) Replace it with the equivalent unaccented character
>>>>
>>>>Right now the code drops it silently, but the unit tests are
>>>>        
>>>>
>expecting
>  
>
>>>>an IllegalArgumentException. The code in Soundex.map(char ch) seems
>>>>        
>>>>
>to
>  
>
>>>>be trying to throw this exception, but it will never happen because
>>>>
>>>>
>>>>        
>>>>
>>>the
>>>
>>>
>>>      
>>>
>>>>characters passed to it from Soundex.soundex are from a String that
>>>>
>>>>
>>>>        
>>>>
>>>has
>>>
>>>
>>>      
>>>
>>>>gone through SoundexUtils.clean(String str) which removes all
>>>>
>>>>
>>>>        
>>>>
>>>characters
>>>
>>>
>>>      
>>>
>>>>that fail a Character.isCharacter(char ch) check (accented chars
>>>>        
>>>>
>fail
>  
>
>>>>this check, I, erm, checked). This means if we want to throw an
>>>>IllegalArgumentException it must be done in SoundexUtils.clean, not
>>>>Soundex.map.
>>>>
>>>>I think either behaviors 1 or 2 (drop silently, which is what we
>>>>currently do) would be easy to implement and then change the unit
>>>>
>>>>
>>>>        
>>>>
>>>tests
>>>
>>>
>>>      
>>>
>>>>to match the behavior so all unit tests on commons-codec pass.
>>>>
>>>>If someone lets me know which behavior is desired I will submit a
>>>>
>>>>
>>>>        
>>>>
>>>patch.
>>>
>>>
>>>      
>>>
>>>>Note that behavior 2 only requires either removing the test cases or
>>>>changing them to expect the same encoding as an empty string.
>>>>
>>>>References:
>>>>http://issues.apache.org/bugzilla/show_bug.cgi?id=29080
>>>>
>>>>
>>>>
>>>>        
>>>>
>>http://www.mail-archive.com/commons-dev@jakarta.apache.org/msg41974.htm
>>    
>>
>l
>  
>
>>>      
>>>
>>>>Best,
>>>>Chris
>>>>
>>>>
>>>>        
>>>>
>>>---------------------------------------------------------------------
>>>      
>>>
>>>>To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>>>>For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>>>>
>>>>
>>>>
>>>>        
>>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>>>
>>>
>>>
>>>      
>>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>>
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message