incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: [lucy-dev] Bundling Snowball
Date Sun, 21 Nov 2010 03:36:32 GMT
On Wed, Nov 10, 2010 at 12:10:51PM -0500, Robert Muir wrote:
> One more note that I forgot to mention: in snowball's svn (but i think
> not in the libstemmer pkg) there is actually vocabulary test data:
> input files containing a sample vocabulary for each language, expected
> output, and combined files called 'diffs' that show what the stemmer
> changes.
> 
> these provide pretty good coverage for tests to ensure your
> integration is working... when they make a change to the algorithms
> these are updated too (though it seems not always in the same commit):

I reopened <https://issues.apache.org/jira/browse/LUCY-125> to add tests based
on the Snowball vocabulary materials.

Those "diff" files are quite large.  Instead of including them all, I just
extracted a sampling of 10 words per language.  That's enough to verify that
our Stemmer Analyzer is at least working for each language, and in my view
it's not necessary to run the full battery of Snowball vocab tests within the
Lucy test suite.

Marvin Humphrey

Mime
View raw message