commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Gregory <GGreg...@seagullsoftware.com>
Subject RE: [codec] Testing Cologne Phonetic
Date Tue, 22 Feb 2011 04:30:08 GMT
> -----Original Message-----
> From: F Mue [mailto:webmaster@genealogie-konzepte.net]
> Sent: Monday, February 21, 2011 08:38
> To: Gary Gregory
> Subject: Re: [codec] Testing Cologne Phonetic
> 
> Hi Gary,
> 
> I don't know if there is an implementation you can take as a reference.
> I found two other implementations:
> * a PHP implementation:
>    http://www.magdev.de/text_colognephonetic/
> * a Perl implementation included in the appendix of a thesis:
> 
> http://www.uni-koeln.de/phil-fak/phonetik/Lehre/MA-Arbeiten/Martin_Wilz.pdf
> but I dn't know if they are producing correct codes.
> 
> In the meantime I modified the implementation I used to create Cologne
> codes and created a new dataset on SourceForge (release 0.11.2). The
> modified implementation is included in the readme file.

Hi Franz,

Thank you for producing this new version. It helped me find a bug in our implementation!

I have a question about the name on line 17: "Aaclan"

The data file shows code 0456 for this name and our implementation produces code 0856.

When I walk through the table http://de.wikipedia.org/wiki/K%C3%B6lner_Phonetik#Buchstabencodes
I think the output should be 0856.

Can you confirm which is the correct value?

Here is how I interpret the table for the letter "C" (which is before a "L")

None of the cases apply until "nicht vor A, H, K, O, Q, U, X"

Thank you,
Gary

> 
> In my spare time project on phonetic codes for German family names (see
> http://www.genealogie-konzepte.net/chamer-phonetik) I just wanted to
> show that Soundex as well as Metaphone and Cologne aren't suitable at
> all to find similar names. Cham phonetics and Phonet are better.

That is cool. Do you have any interest in contributing Java versions of these algorithms to
Apache Commons Codec?

Thank you,
Gary

> 
> 
> Franz
> 
> 
> 
> 
> Am 21.02.2011 01:09, schrieb Gary Gregory:
> > Maybe you should regenerate the file using Commons Codec :)
> >
> > I still want data to check against. Is there cannonical implementation
> out there that you know of?
> >
> > Gary
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message