incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Wellnhofer <wellnho...@aevum.de>
Subject Re: [lucy-dev] Implementing a tokenizer in core
Date Wed, 30 Nov 2011 22:29:26 GMT
On 30/11/11 17:04, Marvin Humphrey wrote:
> The script likely belongs in trunk/devel/bin.
>
> The file with the generated tables could arguably go in a few different
> places.  I would suggest either trunk/core/Lucy/Analysis/WordBreakTables.c
> if the tables are specialized, or trunk/core/Lucy/Util/UnicodeProperties.c if
> we anticipate adding more tables in the future.

OK, things are getting a little more complicated. I'd also like to 
generate some #defines along with the tables, so I could either generate 
a separate .h file, or I could simply create a single .c file that gets 
included by another .c file. This is not very tasteful but it would 
simplify things.

Another question: The perl script that generates the tables uses text 
files from http://www.unicode.org/Public/UNIDATA/. Should we bundle 
these files with Lucy?

Nick

Mime
View raw message