lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Wellnhofer <wellnho...@aevum.de>
Subject [lucy-dev] Sample extension for Lucy
Date Thu, 26 Apr 2012 15:45:49 GMT
I just published a compiled extension for Lucy on Github:

https://github.com/nwellnhof/LucyX-Analysis-WhitespaceTokenizer

It's a simple whitespace tokenizer that's not meant to be used in 
production but to serve as a sample extension for development. Here are 
some notes on stuff that's still to do:

Currently, we use the last component of the module name as parcel. This 
results in very long symbol names in the case of WhitespaceTokenizer. We 
should add a "parcel" build parameter to Clownfish::CFC::Perl::Build, so 
we can use something shorter like "WSToker".

In WhitespaceTokenizer.cfh I had to add a __C__ block that includes 
Lucy/Analysis/Inversion.h because the generated XS needs the 
LUCY_INVERSION VTable. That's not ideal.

As previously mentioned, all Lucy types used in WhitespaceTokenizer.cfh 
have to be prefixed with "lucy_".

There's an intricate problem with XSLoader that only manifests when 
running the tests. See the comment in WhitespaceTokenizer.pm.

It's very illustrative to look at code that's created in autogen when 
building the extension, especially autogen/source/parcel.c.

Nick

Mime
View raw message