lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Kurz <>
Subject Re: [lucy-dev] ClusterSearcher
Date Thu, 10 Nov 2011 22:03:24 GMT
On Mon, Nov 7, 2011 at 1:50 PM, Marvin Humphrey <> wrote:
> On Sun, Nov 06, 2011 at 08:39:51PM -0800, Dan Markham wrote:
>> ZeroMQ and Google's Protocol Buffers both looking great for building a
>> distributed search solution.
> The idea of normalizing our current ad-hoc serialization mechanism using
> Google Protocol Buffers seems interesting, though it looks like it might be a
> lot of work and messy besides.
> First, Protocol Buffers doesn't support C -- only C++, Python and Java -- so
> we'd have to write our own custom plugin.  Dunno how hard that is.

While I'm relying on Google rather than experience, I don't think that
C support is actually a problem.
There seem to be C bindings:
Or roll your own:

> Second, the Protocol Buffers compiler is a heavy dependency -- too big to
> bundle.  We'd have to capture the generated source files in version control.

Alternatively, it could just be a dependency.  While I recognize your
desire to keep the core free of such, I think it's entirely reasonable
for LucyX packages to require outside libraries and tools.  The
question would be whether it's reasonable or desirable to relegate
ClusterSearch to non-core.

> Further investigation seems warranted.  It would sure be nice if we could lower
> our costs for developing and maintaining serialization routines.
On Mon, Nov 7, 2011 at 2:39 PM, Nick Wellnhofer <> wrote:
> MessagePack might be worth a look. See

Yes, that looks good too.  I'm suggesting that we restrict ourselves
to Protocol Buffers, only that it should be possible to use them for
interprocess communication, among other options.  A good architecture
(in my opinion) would be one that allows the over-the-wire protocol to
change without requiring in-depth knowledge of Lucy's internals.  I
think the key is to have a clear definition of what "information" is
required by each layer of Lucy, rather than serializing and
deserializing raw objects.

> As for ZeroMQ, it's LGPL which pretty much rules it out for us -- nothing
> released under the Apache License 2.0 can have a required LGPL dependency.

You know these rules better than I do, but I often worry that your
interpretations are often stricter than required by Apache's legal
There's room for optional dependencies:
For example, it looks like Apache Thrift (another alternative protocol
to consider) isn't scared of ZeroMQ:

>> Regardless of the path we go for building / shipping clustered search
>> solution.  I'm mostly interested in the api's to the lower level lucy that
>> make it possible and how to make them better.
> Well, my main concern, naturally, is the potential burden of exposing low-level
> internals as public APIs, constraining future Lucy core development.

It's a good concern, and I'm not certain what Dan is envisioning, but
I'm hoping that improving the API's means _less_ exposure of the
internals.  Rather than passing around Searcher and Index objects
everywhere, I'd love to make it explicitly clear what information is
available to whom:  if a remote client doesn't return it, you can't
use it.  Instead of increasing exposure for remote clients, we'd
simplify the interface to local Searchers.

> If we actually had a working networking layer, we'd have a better idea about
> what sort of APIs we'd need to expose in order to facilitate alternate
> implementations.  Rapid-prototyping a networking layer in Perl under LucyX with
> a very conservative API exposure and without hauling in giganto dependencies
> might help with that. :)

Yes!  I don't want to stand in the way of progress.  Prototyping
something that works is a great idea.   I don't have the fear of
dependencies that you do, but if you think it's faster to build
something simple from the ground up rather than using a complex
existing package, have at it!


View raw message