lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: [DISCUSS] Archive Lucy
Date Mon, 09 Mar 2009 19:33:15 GMT
Jukka Zitting:

> Try to look at our end of the telescope

I understand the metric you are applying, and I understand and support the
Apache community activity threshold test.  It is clear to me why this audit
has been triggered, and that by pursuing the matter the members of the PMC are
acting in the best interests of Apache.  Nevertheless, I believe that the
outcome of the audit will be something other than a simple shelving.

There are two main factors that have brought us to today's state of affairs.

First, after a promising and productive start, Dave Balmain became (mostly)
unavailable.  If Balmain had been able to maintain his participation, we
wouldn't be having this conversation.

Second, I became absolutely determined to improve the extensibility situation.
I haven't been able to give up on pluggable index components, flexible
indexing, bindings that facilitate rapid prototyping in the host language, a
robust ABI for compiled extensions, loose coupling between the library and its
file format, and human readable metadata and simple binary formats that make
index files easier to grok and debug.

That ambitious agenda is entirely in the spirit of Lucy.  Lucy aims to empower
diverse communities and feed off the dynamism of varied approaches.  It looks
to harness "loose port" energy and encourage cross-pollination.  By making it
as easy as possible to prototype powerful extensions in the host language, the
greater community benefits from being able to evaluate different perspectives.
Then, by making it as easy as possible to port the prototype to a compiled
extension, we encourage ideas to seep across language boundaries.

However, completing all those agenda items requires esoteric, difficult core
architectural work, boring and out of reach for most people.  The elements in
that list aren't the kind of thing where someone just gets an itch they want
to scratch and submits a patch -- they're the kind of thing that allows people
to scratch itches.  My colleagues at Eventful find the Lucy-originated
subclassing abilities in KS pretty convenient, but when I get all amped about
"installing callbacks into dynamically generated vtables", their eyes don't
exactly light up.

If Balmain had been around, we would have been designing these things jointly.
Instead, I've been seeking out collaborators where they can be found.  The OO
hierarchy that supports pluggable indexing components grew out of discussions
with Jason Rutherglen and Mike McCandless.  Nate Kurz was an early and
relentless advocate of mmap.  Peter Karman strongly influenced the design of
Schema and the implementation of human-readable index metadata; his experience
slinging Swish configuration files around came in handy.

When Lucy launched, we didn't have any of those features in either Ferret or
KinoSearch.  Now, prototype implementations are all either done or nearing
completion.  We are moving forward.

> As Grant says, you and others could start fixing this by bringing at
> least a part of the design discussions and related work to the Lucy
> forums. 

The way this started off was with me maintaining duplicate code in the
KinoSearch and Lucy repositories.  (I have a little script I use to swap out
names and copyright notices.)  Code that was developed for Lucy was inserted
into KS so that it would get aired out in real-world situations.  Over time, I
expected Lucy code to gradually take over the KS repository.  (This is where
Peter, Nate and others acquired the impression that KS would ultimately become
Lucy.)

When Balmain was forced to curtail his participation, I gradually began to
develop features needed by Lucy in KS proper and put off the formal discussions
and the cross-commits.  Code started to accumulate, but it was always my
expectation that the material would need to be evaluated before going into
Lucy.

The sticking point is the full implementation of the "Boilerplater" code
generator that Balmain and I sketched out, which is now a small compiler,
along with five core utility classes: Obj, VTable, CharBuf, VArray and Hash.
I've been reluctant to commit those to Lucy without review, but there aren't
very many people out there who have both the expertise and the energy to
review it.  Yet since Boilerplater is a build tool prerequisite, without it
little else can go into the Lucy repository.  It's technically an
implementation detail, since it generates C header files which form a public
API, rather than exposes a public API itself -- but it's a pretty important
implementation detail.

As time has passed, we've gotten closer to the point where KS could publish a
public C API using tools that were built for Lucy.  Real-world feedback from
users would substitute for Balmain's formal review, and even if no single user
had Dave's level of expertise, we'd have enough to satisfy the community
participation requirement and Boilerplater could go in.

Unfortunately, while we're close to a public C API for KS, we aren't quite
there yet; too much time has passed, and the lack of visible activity on Lucy
has triggered an audit.  

Marvin Humphrey


Mime
View raw message