lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: [DISCUSS] Archive Lucy
Date Fri, 06 Mar 2009 21:58:15 GMT

I am currently employed by Eventful, Inc, in San Diego, CA.  They are paying
me to work full-time on KinoSearch and Lucy.

I went out of my way when we negotiated the terms of my employment to ensure
that there was no way my contract could hamper or compromise progress towards
Lucy.  The actual document is confidential of course, but I feel comfortable
saying that first, our lawyers hammered out the legal nuts and bolts to my
satisfaction, and second, Eventful is fully on board with regards to Lucy.  By
way of illustration, my boss regularly hassles me about publishing a Lucy C
API, even though since Eventful uses the Perl bindings the benefits would be

In my opinion, it is not in the best interests of the Apache Lucene project to
make it more difficult for my employer and myself to contribute.

> It is fairly apparent to me that the Lucy project is not making any
> progress community-wise or code-wise.  Neither Marvin, Dave or Doug
> are active at all on it, and that accounts for all three committers.
> There has been very little mailing list traffic, 

You may have noticed that up until about three weeks ago (when I dove back
into the code cave), I was quite active on and in
the Lucene JIRA forums.  Significant design innovations were realized,
particularly in the area of real-time search.

In the past, many designs have been hashed out cooperatively on the KinoSearch
and Lucy mailing lists: the Schema class, revisions to QueryParser and the
boolean Query hierarchy, the implementation of human-readable index metadata,
C configuration probing, the OO model, index designs which exploit memory
mapping, and so on.

In this particular case, however, I was assigned the task of solving real-time
search, for which the Lucy and KinoSearch forums were not ideal.  There is a
very limited number of people who have both the familiarity with the
Lucene/Lucy segment-based inverted index model and the interest to discuss
real-time search at the level I desired, where concepts like "segment-centric
search" could be bandied about.  Basically, I needed Mike McCandless -- so I
went to where he could be found.

The conversations that we had in JIRA and on java-dev were beneficial to both
Lucene and Lucy; should I have posted to the Lucy dev list instead simply to
demonstrate activity, which would have been less useful to Mike, to me, to
Lucy, and to Lucene?  To my mind, the Lucene community is also part of the
Lucy community.  Mike's insights were welcome and useful, and it didn't seem
important to me which specific mailing list they wound up on -- they're all
under the domain, after all.  Weren't we all moving forward
together, and wouldn't that be apparent to members of the PMC such as

Or is this a zero-sum game where design innovations which help Lucy don't
count as "progress" if they also help Lucene?

> Furthermore, I have my doubts about the development process being employed,
> which seems to be the notion that KinoSearch is going to be donated by
> Marvin at some point in the future [1], which would only work if it were to
> go through the Software Grant or Incubation process (which I would be happy
> to support.), or at least that is how I understand the process to be when
> code is developed outside of the ASF.  

I understand why you might have thought that, but that's not how things will
play out, and it's a misreading of the post that you cite.

As you note, simply importing KinoSearch wholesale into the Lucy repository
with cosmetic changes would violate the terms of the project.  But even if
that were possible, it would represent a *horrendous missed opportunity*.

A KinoSearch 1.0 release, with permanent API and file format backwards
compatibility guarantees -- i.e. "there will never be a KinoSearch 2.0" --
will be very beneficial for Lucy's development.  Imposing such discipline
allows library users to proceed with maximum confidence.  For instance, it
allows Peter Karman, who has long planned to build a KS backend for Swish, to
move forward without having to worry about the upstream library pulling the
rug out from underneath his users.

Going that route will maximize our ability to learn the limitations and
weaknesses of the design.  Using the knowledge we gain, we can then forge
ahead as we have in the past: chunk by chuck, class by class.  And even though
I am very pleased with how pluggable index components, C API user interface
improvements, "OS-as-JVM" file format changes, and so on are coming along, I
anticipate lots of healthy debate and major discrepancies between what ends up
in KS 1.0 and what ends up in Lucy.

> Even if KS were the plan, in looking at KS, it seems there is not much
> community activity there, either.

This is largely due to the fact that it has been a long time since I released
any significant public updates.  I choose to release significant updates
infrequently because breaking backwards compatibility has severe consequences
for CPAN modules: as soon as the install completes, live apps start crashing.

Since there is no sane deprecation mechanism for dynamically loaded Perl
modules, minimizing backwards compatibility problems is a responsibility I
take seriously.

> On the flip side, one might ask what's the harm in letting it stand as
> is?  Admittedly, not much, other than I think it confuses people b/c
> they think there is a C port of Lucene and then they go and find it is
> dead.

Indeed.  It's not like Lucy in its present form causes harm to the bottom line
of Lucid Imagination, Inc. ;)

> Therefore, it is with some hesitation that I suggest we mothball
> Lucy.  Mostly, I hesitate, because I hate to see any project be
> archived on the hope that someone will come in and pick it up.
> However, I just don't see that happening.  If Marvin wishes to
> resurrect it, he can donate KS (or whatever core part of it is Lucy)
> and go through incubation and prove there is a community and then we
> can turn it back on.

Please give me two to three months to make the next dev release of KinoSearch.
FWIW, if I can't get a release out within that time frame, I'm going to have
to answer to Eventful. :)

This release will introduce real-time search, improved subclassing support, an
mmap-friendly index file format, and pluggable indexing components.  I suspect
aspects of it may be of interest to the Java Lucene dev community -- but if
that's the case, I won't hold it against you. ;)


Marvin Humphrey

View raw message