lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <>
Subject Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?
Date Mon, 01 Mar 2010 18:48:58 GMT
Hey Hoss,

I support Mike's original suggestion of having a shared, independently maintained/released
analysis package for Nutch/Solr/Lucene. I emphatically do not support merging Solr and Lucene
in the way proposed.

Hope that clarifies things, at least from me.


On 3/1/10 11:43 AM, "Chris Hostetter" <> wrote:

(Man, why is it you guys alwasy decide to start the monolithic
"let's redesign the world" threads while i'm offline for a few days ...
I figured at worst I'd 'svn up' and discover that McCandless had
reimplemented all of the indexing code in Scala, but i certainly wasn't
expecting all of this.)

As some one who has attempted to read it all at once, let me just say that
this thread is way too big.

I say this not as a facetious comment about the number of messages or the
depth of replies but as a serious comment about the breadth and depth of
the core issues that people seem to be trying to address in a monolithic
fashion -- monolithic suggestions which are in many ways diametricly
opposed to each other.

Without obvious concensious on where we want to go, or a clear sense of
how well things will work when we there "there" it seems most productive
to focus on what would be needed to achieve some incremental steps that
could be productive for any/all goals.

At it's core: this thread started with McCandless'ss suggestion that
refactoring some of text analysis code from Solr, Nutch and Lucene-Java
out of all three projects and into a common code base would be beneficial
to all three subprojects -- Not only do I see no flaw to that reasoning,
but it also seems like it would (oddly enough) serve as a good first step
towards *either* tighter development integration between Lucene-Java and
Solr, *OR* towards looser development of the two code bases (via making
Solr a seperate TLP).

Developing a new code module like this should help demonstrate / excercise
some of the "process" issues that might come up in trying to integrate the
development and release processes of the existing products.  If things
work out "well" that may illustrate that tighter integration is better; if
things work out "poor" that should also tells us something, and may give
us guidance on how to move forward.  In the worst case scenerio that i can
imagine: some code is refactored out of Solr and Nutch in a way that makes
it more directly usable by other comsumers of Lucene-Java.  (Even if Solr
and Nutch never use that code and become their own TLPs and succed from
the ASF to become caribbean tax haven that seems like a Net win for

To put the issue another way: Does anyone see how McCandless'ss suggestion
would be counter-productive towards your vision of what Lucene/Solr/Nutch
should be like in the future? (regardless of your particular vision is)


: I started here with analysis because I think that's the biggest pain
: point: it seemed like an obvious first step to fixing the code
: duplication and thus the most likely to reach some consensus.  And
: it's also very timely: Robert is right now making all kinds of great
: fixes to our collective analyzers (in between bouts of fuzzy DFA
: debugging).


Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message