lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <>
Subject Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?
Date Mon, 01 Mar 2010 19:27:33 GMT
On 03/01/2010 01:43 PM, Chris Hostetter wrote:
> (Man, why is it you guys alwasy decide to start the monolithic
> "let's redesign the world" threads while i'm offline for a few days ...
> I figured at worst I'd 'svn up' and discover that McCandless had
> reimplemented all of the indexing code in Scala, but i certainly wasn't
> expecting all of this.)
> As some one who has attempted to read it all at once, let me just say that
> this thread is way too big.
> I say this not as a facetious comment about the number of messages or the
> depth of replies but as a serious comment about the breadth and depth of
> the core issues that people seem to be trying to address in a monolithic
> fashion -- monolithic suggestions which are in many ways diametricly
> opposed to each other.
Personally, I don't think the idea of a merge is too big. I think the 
implications of it are less than you are making them out to be.
Monolithic suggestions? Lets half merge? Lets draft a resolution 
indicating that both Lucene and Solr devs would like to possibly play 
nicer together with more communication? I don't think that are a lot of 
baby steps towards this goal that will have any meaning or ramifications.

> Without obvious concensious on where we want to go, or a clear sense of
> how well things will work when we there "there" it seems most productive
> to focus on what would be needed to achieve some incremental steps that
> could be productive for any/all goals.
That sounds like magic to me :) Or focusing on stuff that has nothing to 
do with a merge or TLP.
> At it's core: this thread started with McCandless'ss suggestion that
> refactoring some of text analysis code from Solr, Nutch and Lucene-Java
> out of all three projects and into a common code base would be beneficial
> to all three subprojects -- Not only do I see no flaw to that reasoning,
> but it also seems like it would (oddly enough) serve as a good first step
> towards *either* tighter development integration between Lucene-Java and
> Solr, *OR* towards looser development of the two code bases (via making
> Solr a seperate TLP).
> Developing a new code module like this should help demonstrate / excercise
> some of the "process" issues that might come up in trying to integrate the
> development and release processes of the existing products.  If things
> work out "well" that may illustrate that tighter integration is better; if
> things work out "poor" that should also tells us something, and may give
> us guidance on how to move forward.  In the worst case scenerio that i can
> imagine: some code is refactored out of Solr and Nutch in a way that makes
> it more directly usable by other comsumers of Lucene-Java.  (Even if Solr
> and Nutch never use that code and become their own TLPs and succed from
> the ASF to become caribbean tax haven that seems like a Net win for
> Lucene-Java)
> To put the issue another way: Does anyone see how McCandless'ss suggestion
> would be counter-productive towards your vision of what Lucene/Solr/Nutch
> should be like in the future? (regardless of your particular vision is)
No, not necessarily - but I don't think its going to tell us anything 
useful about a merge. Its just going
to factor out some analyzers into what is likely going to be yet 
*another* project with more "do we run on trunk"
or "don't we" issues. Or it will be a Lucene contrib, and cause us even 
more headaches due to Solr not running on trunk.

> 			...
> : I started here with analysis because I think that's the biggest pain
> : point: it seemed like an obvious first step to fixing the code
> : duplication and thus the most likely to reach some consensus.  And
> : it's also very timely: Robert is right now making all kinds of great
> : fixes to our collective analyzers (in between bouts of fuzzy DFA
> : debugging).
> -Hoss

- Mark

View raw message