lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: [lucy-dev] Bundling Snowball
Date Wed, 10 Nov 2010 17:10:51 GMT
On Tue, Nov 9, 2010 at 3:53 PM, Marvin Humphrey <marvin@rectangular.com> wrote:
> On Tue, Nov 09, 2010 at 04:51:33AM -0500, Robert Muir wrote:
>> Some quick notes, from lucene-java:
>

One more note that I forgot to mention: in snowball's svn (but i think
not in the libstemmer pkg) there is actually vocabulary test data:
input files containing a sample vocabulary for each language, expected
output, and combined files called 'diffs' that show what the stemmer
changes.

these provide pretty good coverage for tests to ensure your
integration is working... when they make a change to the algorithms
these are updated too (though it seems not always in the same commit):

example: http://svn.tartarus.org/snowball/trunk/data/german/diffs.txt?r1=527&r2=526&pathrev=527

Mime
View raw message