commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oliver Heger <oliver.he...@oliver-heger.de>
Subject Re: [codec] Large test data set!
Date Wed, 26 Jan 2011 20:16:44 GMT
Am 26.01.2011 02:36, schrieb Gary Gregory:
>> -----Original Message-----
>> From: Oliver Heger [mailto:oliver.heger@oliver-heger.de]
>> Sent: Tuesday, January 25, 2011 15:19
>> To: Commons Developers List
>> Subject: Re: [codec] Large test data set!
>>
>> Am 25.01.2011 21:01, schrieb Gary Gregory:
>>> Hi All:
>>>
>>> I just found a data set that I would like to integrate with [codec] to
>> test the language package:
>>>
>>> http://sourceforge.net/projects/familynamephon/
>>>
>>> The test data file contains 837K German names (37MB) in a text file and
>> encodings in Cham (?) phonetics, Cologne phonetics, Metaphone, and Soundex.
>>>
>>> I have no idea how long it would take to run a test for our language
>> encoders on this but I imagine making it an optional unit test. How do you
>> do THAT in Maven?
>>>
>>> The data is covered (I think, I do not read German) by this license:
>> http://www.opendatacommons.org/licenses/odbl/1.0/
>>
>> Being a native German speaker I can confirm that the license is actually
>> the Open Database License which can be found at the URL you provided.
>
> Can we include the data file in our tests? The PDF describing the file?
>
> Thank you,
> Gary

Well, IANAL.

But if I understand the license correctly, according to paragraph 3 we 
should be allowed to use the data as part of our tests and distribute 
it. We have to adhere to the usage conditions defined in paragraph 4, so 
we would have to add a note to our NOTICE.txt.

However, it will probably do no harm to ask at legal@.

Oliver

>
>>
>> Cham phonetics seems to be a special algorithm for encoding names. [1]
>> contains more background information about it (unfortunately also in
>> German). According to this page the name stems from a region in Bavaria.
>> You can find a PHP implementation of this algorithm in [2].
>>
>> HTH
>> Oliver
>>
>> [1] http://www.genealogie-konzepte.net/chamer-phonetik
>> [2] http://www.genealogie-konzepte.net/chamer-phonetik/implementierung
>>
>>>
>>> Thoughts?
>>> Gary Gregory
>>> Senior Software Engineer
>>> Rocket Software
>>> 3340 Peachtree Road, Suite 820 * Atlanta, GA 30326 * USA
>>> Tel: +1.404.760.1560
>>> Email: ggregory@seagullsoftware.com<mailto:ggregory@seagullsoftware.com>
>>> Web: seagull.rocketsoftware.com<http://www.seagull.rocketsoftware.com/>
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message