lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: [PMC] [DISCUSS] Lucy
Date Thu, 17 Jun 2010 18:30:58 GMT
On Thu, Jun 17, 2010 at 06:25:13AM -0400, Grant Ingersoll wrote:
> I guess a big question I have is how does Lucy actually relate to Lucene?

When Lucy was conceived, we envisioned that our eventual sub-communities
(Perl, Ruby, etc) would approach using and extending the library in distinct
ways, and that we would be able to harness the creative tension between the
different cultures to drive innovation.

Lucy and Lucene have that kind of a relationship today.

> But, AFAICT, and Marvin, please correct me, it doesn't share the file format
> and it doesn't share the API as those have diverged a fair bit ago.  It also
> doesn't share users or devs.  

The APIs have diverged, but they are not very far apart.  The biggest
difference is that in addition to analysis, index, search, store and util, we
also have "plan": Lucy incorporates per-index schema definitions into the core
library, while in Lucene, that's handled at the Solr layer.  

We had planned to use per-index field semantics from Day 1 -- that's the
reason that the Lucy proposal makes explicit mention of using "KinoSearch's
back end as a template".  The original rationale was technical -- the merge
algorithm of Lucene 1.4.3 involved a huge amount of object creation and
destruction that crushed performance under OO models more expensive than
Java's.  Of course that architecture wasn't ideal for Java, either, and since
Mike McCandless incorporated some of those ideas into Lucene 2.3 the projects
have converged.

Aside from that, the API differences are mostly superficial.  Both libraries
have TermQuery classes, but in Lucy we don't have a Term class -- a "term" is
just a generic object.  (That allows us to implement numeric types more
easily.) The Lucy TermQuery constructor would look like this in Java:

    public TermQuery(String field, Object term) { ... }

So, both libraries have TermQuery classes, and they both take a field and a
search term, but the constructor method signatures vary slightly.  That's not
an important distinction, IMO.  

Even so, there has been talk of deprecating Term in Lucene, as well -- so
perhaps the projects will converge again.  I suspect that this sort of
convergence will keep happening over time as successful innovations
cross-pollinate.  I'm very curious to see how our differing approaches to
near-real-time search play out.

With regards to the file format, in my opinion it's not realistic for Lucy or
any other library that is not derived line-for-line from Java Lucene's source
code to support Lucene indexes.  The Lucene file format is an implementation
detail, not a public spec, and it's incredibly complex.  I think we can go the
other way, though, and make it possible for Lucene to read Lucy indexes.

With regards to non-overlapping users and devs, well, I think that the Apache
board is taking the correct approach in moving sub-projects to TLPs when that
kind of a situation exists, and that Lucy should aim to develop more
contributors, committers, and ultimately potential Lucy PMC members, then
graduate.

> I ask those things, b/c I think the answers will help us understand better
> whether this is something the Lucene PMC is interested in status checking,
> etc, to which it hasn't shown a track record of doing to date.

My sense is that Lucy had better self-regulate or else.  :)  I think Doug's
suggestion of following the incubation check list gives us the tool we need
for that.  

Marvin Humphrey


Mime
View raw message