lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew S. Townley" <>
Subject Re: [lucy-dev] RegexTokenizer
Date Tue, 08 Mar 2011 19:35:59 GMT

On 8 Mar 2011, at 7:24 PM, Marvin Humphrey wrote:

> On Tue, Mar 08, 2011 at 05:50:34PM +0000, Andrew S. Townley wrote:
>> If you wanted to lock it in across host languages, then you could always
>> implement this in C using the library of your choice due to the
>> architecture, right?
> Yes, most likely using PCRE.  I think that would make sense to implement as an
> extension, distributed seperately.  Bundling PCRE with core Lucy would provide
> very little benefit at a large cost, though.  Every host provides a regex
> engine that users are already familiar with, and I expect that few users will
> require indexes to work across multiple hosts.

Unless you're doing something crazy like I plan to do (eventually down the line) and make
a common C++ codebase the lucy client and then expose that C++ codebase API in multiple host
languages accessing the same underlying store infrastructure. ;)

In the current architecture I have with Ferret, even though everything's Ruby, nobody ever
touches Ferret directly.  The future architecture will have similar characteristics, but the
implementation languages and components will be different.

In this case, however, the regex support will most likely be implemented via Boost's regex
engine(s).  That's one of the other reasons I want match offset information to be available
from the search library directly instead of depending on anything from the (ultimate) host

Based on your answers though, it still seems like this should be possible using a C++ as C
host implementation strategy--convoluted as it may sound.


Andrew S. Townley <>

View raw message