commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Gregory <>
Subject [codec] Large test data set!
Date Tue, 25 Jan 2011 20:01:50 GMT
Hi All:

I just found a data set that I would like to integrate with [codec] to test the language package:

The test data file contains 837K German names (37MB) in a text file and encodings in Cham
(?) phonetics, Cologne phonetics, Metaphone, and Soundex.

I have no idea how long it would take to run a test for our language encoders on this but
I imagine making it an optional unit test. How do you do THAT in Maven?

The data is covered (I think, I do not read German) by this license:

Gary Gregory
Senior Software Engineer
Rocket Software
3340 Peachtree Road, Suite 820 * Atlanta, GA 30326 * USA
Tel: +1.404.760.1560

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message