lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: lucy progress
Date Fri, 18 Apr 2008 02:22:50 GMT

On Apr 17, 2008, at 3:00 PM, John Wang wrote:

> What is the current progress on Lucy?

For various reasons, Dave Balmain has been largely unavailable over  
the last year.  Without my primary co-conspirator, and without a user  
base, there weren't a lot of people to bounce ideas off of, so I chose  
to go where the action was -- back to the KinoSearch user base.

However, when I went back, I took the designs and concepts that Dave  
and I had hashed out for Lucy and worked them into KS.  So now those  
designs been aired out and tested over a bunch of KS devel releases,  
and I've managed to work through a number of problems we'd left  
unresolved.  In the process I've accumulated a fair amount of material  
I can commit to the Lucy repo after a bit of cleanup, and for some  
reason three different emails arrived today inquiring about Lucy's  
status -- so I should probably get busy.  My plan is to finish the  
next KS release before I get back to Lucy in earnest, though, so a  
formal Lucy release is not imminent.

> Which version of Lucene index format is it up to?

We aren't at that point.  In any case, the only thing which has given  
Lucy a shot of working with Lucene indexes is the very recent  
resolution of LUCENE-510 -- Lucy wouldn't have worked with Lucene  
files at all except perhaps in a crippled compatibility mode had the  
Lucene file format not changed.

Looking forward, file format compatibility may continue to be a  
bugaboo.  The Lucene format spec document was written up as an  
afterthought rather than composed, and it is exceedingly difficult to  
implement unless you are able to do a close line-by-line port -- which  
you can't when your target is a dynamic language.  For any port to  
establish and maintain compatibility with the spec is an expensive,  
fiddly time-suck, and that will continue to be the case so long as the  
file format changes up as rapidly as it has historically.

Indeed, my primary goal with the next KS release is to design and  
write up a formal file spec which, when compared with the current  
Lucene spec is: shorter, simpler, more coherent, easier to implement,  
easier to extend, evolves more gracefully, uses human-readable  
metadata, and perhaps even lends itself to faster searching and  
indexing.  To see some of what I'm up to, follow the discussion that  
Mike McCandless and I are having under the "Flexible indexing design"  
and the "Pooling of postings in DocumentsWriter" threads.

If I'm successful in that file spec design effort, perhaps I can  
persuade the Lucene community that it's in everyone's interest to  
adopt some of its major elements -- or even better, collaborate with  
other Lucene devs to improve on it.  There's historical precedent for  
that best-case course of events in how Lucene's recent indexing speed  
improvements came about.

An saner file spec would make Lucy (and other ports) *much* less  
complicated to write and maintain.  That would represent significant  
"lucy progress", and that's where a good fraction of my energies are  
being expended right now.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message