lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?
Date Sun, 28 Feb 2010 10:57:05 GMT
To make this more concrete, I think this is roughly what's being
proposed:

  * Merging the dev lists into a single list.

  * Merging committers.

  * When a change it committed to Lucene, it must pass all Solr
    tests.

  * Release both at once.

These things would not change:

  * Most importantly, the source code would remain factored into
    separate dirs/modules.

  * User's lists should remain separate.

  * Web sites would remain separate.

  * Solr & Lucene are still separate downloads, separate JARs,
    seperate subdirs in the source tree, etc.

The outside world still sees Solr & Lucene as separate entities.  It's
only that they would now be developed/released in synchrony.

There are some important gains by doing this:

  * Single source for all the code dup we now have across the
    projects (my original reason, specifically on analyzers, for
    starting this).

  * Whenever a new feature is added to Lucene, we'd work through what
    the impact is to Solr.  This can still mean we separately develop
    exposure in Solr, but it'd get us to at least more immediately
    think about it.

  * Solr is Lucene's biggest direct user -- most people who use Lucene
    use it through Solr -- so having it more closely integrated means
    we know sooner if we broke something.

  * Right now I could test whether flex breaks anything in Solr.  I
    can't do that now since Solr is isn't upgraded to 3.1.

Recent big changes (eg segment based searching, Version, attr based
tokenstream api) caused alot of work in Solr that could've been much
smoother had Solr "been there" as we were working through them.

Recent new features, eg near-real-time search, which are unavailable
in Solr still, would have at least had some discussion about how to
expose this in Solr.

Over time (and we don't have to do this right on day 1) we can make
core capabilities available to pure Lucene.  EG core Lucene users
should be able to use faceting, use a schema, etc.

I think this idea makes alot of sense and I think now is a good time
to do it.  Yes, this a big change, but I think the gains are sizable.
As Lucene & Solr diverge more, it'll only become harder and harder to
merge.

Robert's massive patch on SOLR-1657, upgrading most Solr's analyzers
to 3.0, is aging... while other changes to analyzers are being
proposed (SOLR-1799).  If we were integrated (or at least single
source for analyzers), Robert would already have committed it.

Mike

On Fri, Feb 26, 2010 at 5:20 PM, Yonik Seeley
<yonik@lucidimagination.com> wrote:
> On Fri, Feb 26, 2010 at 5:15 PM, Steven A Rowe <sarowe@syr.edu> wrote:
>> On 02/24/2010 at 2:20 PM, Yonik Seeley wrote:
>>> I've started to think that a merge of Solr and Lucene would be in the
>>> best interest of both projects.
>>
>> The Sorlucene :) merger could be achieved virtually, i.e. via policy, rather than
physically merging:
>
> Everything is virtual here anyway :-)
> I agree with Mike that a single dev list is highly desirable.  There
> would still be separate downloads.  What to do with some of the other
> stuff is unspecified.
>
> Committers would need to be merged though - that's the only way to
> make a change across projects w/o breaking stuff.
>
> -Yonik
>

Mime
View raw message