lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?
Date Wed, 24 Feb 2010 21:04:51 GMT

On Feb 24, 2010, at 2:09 PM, Doug Cutting wrote:

> Michael McCandless wrote:
>> I think, in order to stop duplicating our analysis code across
>> Nutch/Solr/Lucene, we should separate out the analyzers into a
>> standalone package, and maybe as its own sub-project under the Lucene
>> tlp?
> 
> Is the goal to release these on a separate schedule from Lucene Java? If so, then this
makes sense, if not, then perhaps this could be simply a separate source code tree in Lucene
Java built as separate jars.
> 
> Where would the analyzer APIs live, in the core or in the analyzer tree?  My guess is
that they'd live in the core, and that the analyzer tree would depend on the core, but one
might do it the other way around if one felt there were non-Lucene uses for analyzers.
> 
> Note that subprojects with different committer lists are an anti-pattern at Apache. 
We've long done this in Lucene, but have recently been asked by the board to consider breaking
most subprojects into their own TLPs.

Yeah, I've seen rumblings of this, but not sure why it is a big deal here.  Many of Lucene's
projects are related and interoperate with some committer overlap, but not all.  For instance,
Lucene.NET and PyLucene don't have a lot of overlap committer wise, but it would be silly
for them to be TLPs.  To me, Lucene has spun off subprojects when it makes sense, i.e. Hadoop
and potentially Mahout in the near future, but otherwise, "if it ain't broke, don't fix it".


>  Would analyzers someday make sense as an indepdendent TLP?  If not, then a subproject
with disjoint committers might not be the right pattern.
> 

In my mind, I think all current committers for Lucene/Nutch/Solr would be committers on this
new project.

-Grant
Mime
View raw message