commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niall Pemberton <niall.pember...@gmail.com>
Subject Re: [codec] Large test data set!
Date Wed, 26 Jan 2011 10:45:43 GMT
On Tue, Jan 25, 2011 at 8:01 PM, Gary Gregory
<GGregory@seagullsoftware.com> wrote:
> Hi All:
>
> I just found a data set that I would like to integrate with [codec] to test the language
package:
>
> http://sourceforge.net/projects/familynamephon/
>
> The test data file contains 837K German names (37MB) in a text file and encodings in
Cham (?) phonetics, Cologne phonetics, Metaphone, and Soundex.
>
> I have no idea how long it would take to run a test for our language encoders on this
but I imagine making it an optional unit test. How do you do THAT in Maven?
>

One way would be to have a separate profile which configures the
surefire-plugin to only include that test and have the *normal*
suefire-plugin config exclude it - lang has this for the
RandomUtilsFreqTest in th2 2.x branch:

http://svn.apache.org/repos/asf/commons/proper/lang/tags/LANG_2_6/pom.xml

Niall

> The data is covered (I think, I do not read German) by this license: http://www.opendatacommons.org/licenses/odbl/1.0/
>
> Thoughts?
> Gary Gregory
> Senior Software Engineer
> Rocket Software
> 3340 Peachtree Road, Suite 820 * Atlanta, GA 30326 * USA
> Tel: +1.404.760.1560
> Email: ggregory@seagullsoftware.com<mailto:ggregory@seagullsoftware.com>
> Web: seagull.rocketsoftware.com<http://www.seagull.rocketsoftware.com/>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message