lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: [lucy-dev] Bundling Snowball
Date Wed, 10 Nov 2010 19:19:14 GMT
On Wed, Nov 10, 2010 at 12:10:51PM -0500, Robert Muir wrote:
> One more note that I forgot to mention: in snowball's svn (but i think not
> in the libstemmer pkg) there is actually vocabulary test data: input files
> containing a sample vocabulary for each language, expected output, and
> combined files called 'diffs' that show what the stemmer changes.
> these provide pretty good coverage for tests to ensure your
> integration is working... when they make a change to the algorithms
> these are updated too (though it seems not always in the same commit):
> example:

I used this sample data to prepare tests for the Lingua::Stem::Snowball CPAN
distribution.  Now that we are bundling the Snowball C libraries, we are no
longer benefitting by proxy from that test suite, and we should roll our own

Yesterday, I adapted the script in
<> to work off of an svn
checkout of Snowball; I committed the patches and closed the issue this

Now I'll go add test data generation to's capabilities and
add new test files for each language to validate that our stemmers work

Thanks for bringing it up!

Marvin Humphrey

View raw message