commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edelson, Justin" <Justin.Edel...@mtvi.com>
Subject RE: [codec] Soudex issue with accented character.
Date Wed, 02 Jun 2004 15:11:46 GMT
That's not the behavior either in the latest [codec] release or HEAD.
Can you clarify where this 'standard' behavior you describe is
documented? Neither the National Archives documentation nor the NIST
source code contain this behavior.

> -----Original Message-----
> From: C. Scott Ananian [mailto:cscott@cscott.net] 
> Sent: Wednesday, June 02, 2004 11:02 AM
> To: Jakarta Commons Developers List
> Subject: RE: [codec] Soudex issue with accented character.
> 
> 
> On Wed, 2 Jun 2004, Edelson, Justin wrote:
> 
> > The only "better" solution I can think of is to map the characters 
> > into their non-accented equivalent. While I think it's important to 
> > state that the default Soundex implementation is for 
> English words, it 
> > would be nice to accommodate words with accented characters.
> 
> I believe the 'standard' behavior is just to drop the 
> unaccented character from the soundex encoding.  The soundex 
> algorithm typically already does this for other 'quiet' 
> characters. (Note that two words with accented characters 
> will still match correctly even if the accented characters 
> are dropped.)  --scott
> 
> blowfish Rijndael Philadelphia MI6 operation Washington SSBN 
> 731 UKUSA spy chemical agent Pakistan Bush Waihopai Minister 
> domestic disruption
>                          ( http://cscott.net/ )
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message