lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Busch <busch...@gmail.com>
Subject Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?
Date Sun, 28 Feb 2010 17:52:55 GMT
I'm not very happy with this proposal. I certainly understand what is
being tried to achieve though. I'd like to see a tighter integration
and communication between Lucene core and SOLR too, but the proposed
requirements seem much too strict. For example, I think it's a good
idea for SOLR to ride on Lucene's trunk again. This will show
potential problems of API changes and new features in Lucene much more
quickly. It will also help SOLR to use new Lucene features much more 
quickly.

However, I'm -1 for these points:

  * When a change it committed to Lucene, it must pass all Solr tests.
  * Release both at once.

SOLR is a consumer of Lucene's API. So what this requirement basically
translates to is that I, as a Lucene committer, now have to not only
make sure Lucene's backwards-compatibility is ensured, but also that I
make all necessary changes in SOLR. So I have to know much more code
suddenly and potentionally make many more changes. But this doesn't
help all the other Lucene consumers out there. I invested several
weeks upgrading our software at IBM to 3.0 APIs, because I had 5000
compile errors.
I think the Lucene backwards-compatibility policy is very strict
already and it often takes more time working on bw-compat than the
actual feature. With the additional requirement above this will get
worse, and I'm afraid it might slow down Lucene's progress.

I don't disagree that things like moving function queries from SOLR to
Lucene have failed - but we have to ask why they weren't added to
Lucene in the first place. Was there ever a discussion whether those
queries should be added to Lucene or SOLR when they were developed? Or I'd
also love to see a powerful facet engine in Lucene, and SOLR would
build its faceting features on top of those APIs.

So I'm +1 for better communication (maybe even merging the dev lists) and
especially talking about where a new feature should live before
working on a patch.

  Michael

On 2/28/10 2:57 AM, Michael McCandless wrote:
> To make this more concrete, I think this is roughly what's being
> proposed:
>
>    * Merging the dev lists into a single list.
>
>    * Merging committers.
>
>    * When a change it committed to Lucene, it must pass all Solr
>      tests.
>
>    * Release both at once.
>
> These things would not change:
>
>    * Most importantly, the source code would remain factored into
>      separate dirs/modules.
>
>    * User's lists should remain separate.
>
>    * Web sites would remain separate.
>
>    * Solr&  Lucene are still separate downloads, separate JARs,
>      seperate subdirs in the source tree, etc.
>
> The outside world still sees Solr&  Lucene as separate entities.  It's
> only that they would now be developed/released in synchrony.
>
> There are some important gains by doing this:
>
>    * Single source for all the code dup we now have across the
>      projects (my original reason, specifically on analyzers, for
>      starting this).
>
>    * Whenever a new feature is added to Lucene, we'd work through what
>      the impact is to Solr.  This can still mean we separately develop
>      exposure in Solr, but it'd get us to at least more immediately
>      think about it.
>
>    * Solr is Lucene's biggest direct user -- most people who use Lucene
>      use it through Solr -- so having it more closely integrated means
>      we know sooner if we broke something.
>
>    * Right now I could test whether flex breaks anything in Solr.  I
>      can't do that now since Solr is isn't upgraded to 3.1.
>
> Recent big changes (eg segment based searching, Version, attr based
> tokenstream api) caused alot of work in Solr that could've been much
> smoother had Solr "been there" as we were working through them.
>
> Recent new features, eg near-real-time search, which are unavailable
> in Solr still, would have at least had some discussion about how to
> expose this in Solr.
>
> Over time (and we don't have to do this right on day 1) we can make
> core capabilities available to pure Lucene.  EG core Lucene users
> should be able to use faceting, use a schema, etc.
>
> I think this idea makes alot of sense and I think now is a good time
> to do it.  Yes, this a big change, but I think the gains are sizable.
> As Lucene&  Solr diverge more, it'll only become harder and harder to
> merge.
>
> Robert's massive patch on SOLR-1657, upgrading most Solr's analyzers
> to 3.0, is aging... while other changes to analyzers are being
> proposed (SOLR-1799).  If we were integrated (or at least single
> source for analyzers), Robert would already have committed it.
>
> Mike
>
> On Fri, Feb 26, 2010 at 5:20 PM, Yonik Seeley
> <yonik@lucidimagination.com>  wrote:
>    
>> On Fri, Feb 26, 2010 at 5:15 PM, Steven A Rowe<sarowe@syr.edu>  wrote:
>>      
>>> On 02/24/2010 at 2:20 PM, Yonik Seeley wrote:
>>>        
>>>> I've started to think that a merge of Solr and Lucene would be in the
>>>> best interest of both projects.
>>>>          
>>> The Sorlucene :) merger could be achieved virtually, i.e. via policy, rather
than physically merging:
>>>        
>> Everything is virtual here anyway :-)
>> I agree with Mike that a single dev list is highly desirable.  There
>> would still be separate downloads.  What to do with some of the other
>> stuff is unspecified.
>>
>> Committers would need to be merged though - that's the only way to
>> make a change across projects w/o breaking stuff.
>>
>> -Yonik
>>
>>      
>    


Mime
View raw message