commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edelson, Justin" <>
Subject RE: [codec] Soudex issue with accented character.
Date Wed, 02 Jun 2004 14:55:39 GMT
The only "better" solution I can think of is to map the characters into their non-accented
equivalent. While I think it's important to state that the default Soundex implementation
is for English words, it would be nice to accommodate words with accented characters.

My bigger concern is that the behavior is inconsistent between Soundex, Metaphone, & DoubleMetaphone.
Soundex will not throw an IllegalArgumentException, whereas Metaphone passes through the "bad"
character. DoubleMetaphone has support for two accented characters, C with Cedilla and N with

To the extent that I think the language codecs should be swappable components, it's a good
idea for the support to be consistent. To that end, a String passed to any of the codecs should
either throw an exception for all or none.

Just my 2 cents.

-----Original Message-----
From: Gary Gregory [] 
Sent: Sunday, May 23, 2004 8:37 PM
To: Jakarta Commons Developers List
Subject: [codec] Soudex issue with accented character.

Currently, "ö" or "é" in a String causes Soundex to throw an ArrayIndexOutOfBoundsException.

We can either:

(1) Throw a better Exception, like IllegalArgumentException: Only 'plain' letter are allowed.


(2) Ignore unmapped characters. This would work for "ö" and "é" since vowels are ignored
but this could cause bad encoding values for other chars like "ç".

AFAIK, you cannot ask if a character is a vowel or not.



To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message