incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: [lucy-dev] StandardTokenizer has landed
Date Tue, 06 Dec 2011 16:01:32 GMT
On Tue, Dec 6, 2011 at 8:45 AM, Nick Wellnhofer <wellnhofer@aevum.de> wrote:
> What I still want to do is to incorporate the word break test cases from the
> Unicode website:
>
> http://www.unicode.org/Public/6.0.0/ucd/auxiliary/WordBreakTest.txt
>

we use a script that generates a unit test from this file... maybe you
can reuse some of the code for your purposes:
http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/test/org/apache/lucene/analysis/core/generateJavaUnicodeWordBreakTest.pl

-- 
lucidimagination.com

Mime
View raw message