lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: [lucy-dev] Implementing a tokenizer in core
Date Thu, 01 Dec 2011 00:04:03 GMT
On Wed, Nov 30, 2011 at 11:29:26PM +0100, Nick Wellnhofer wrote:
> OK, things are getting a little more complicated. I'd also like to  
> generate some #defines along with the tables, so I could either generate  
> a separate .h file, or I could simply create a single .c file that gets  
> included by another .c file. This is not very tasteful but it would  
> simplify things.

All of those sound fine to me.  Sounds like you like the .h file option best,
so +1 to that.

> Another question: The perl script that generates the tables uses text  
> files from http://www.unicode.org/Public/UNIDATA/. Should we bundle  
> these files with Lucy?

How about we provide a link in the script's docs to the monolithic archive of
the version of those files we want to use?  For instance:

    http://www.unicode.org/Public/6.0.0/ucd/UCD.zip

Then the script can just take an arg to the expanded directory.

    perl devel/bin/gen_uniprops.pl /path/to/UCD

We can also bundle if you prefer (the license allows it) -- it's just a little
more work and a little more bandwidth.

Marvin Humphrey


Mime
View raw message