commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "C. Scott Ananian" <csc...@cscott.net>
Subject RE: [codec] Soudex issue with accented character.
Date Wed, 02 Jun 2004 15:02:02 GMT
On Wed, 2 Jun 2004, Edelson, Justin wrote:

> The only "better" solution I can think of is to map the characters into
> their non-accented equivalent. While I think it's important to state
> that the default Soundex implementation is for English words, it would
> be nice to accommodate words with accented characters.

I believe the 'standard' behavior is just to drop the unaccented character
from the soundex encoding.  The soundex algorithm typically already does
this for other 'quiet' characters. (Note that two words with
accented characters will still match correctly even if the accented
characters are dropped.)
 --scott

blowfish Rijndael Philadelphia MI6 operation Washington SSBN 731 UKUSA
spy chemical agent Pakistan Bush Waihopai Minister domestic disruption
                         ( http://cscott.net/ )

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message