lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <>
Subject Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?
Date Sun, 28 Feb 2010 17:27:28 GMT
Hi All,

+1, I'm with Ian on this one. Loose coupling is always better in these types of situations...


On 2/28/10 8:07 AM, "Ian Holsman" <> wrote:

I'm not a committer here (or on SOLR), so I can't vote, but I'm
generally against this. but on the flip side I've been using SOLR for
quite a while.

firstly SOLR is not the only application that uses lucene as a webservice.

waiting for SOLR developers to implement re-factorings and changes made
to the core will hamper lucene development.
and things like katta, elastic search, neo4j, and zoie will be treated
like 2nd class citizens and suffer.

It will also hamper innovative new developments, as now 'oh.. this will
break SOLR', or 'SOLR can't use that easily' will stop them. I'm curious
how the NRT enhancements and payload changes would have gone if they had
to wait for SOLR to change stuff to make them work. and most of the SOLR
dev's are on the lucene dev list anyway.

SOLR should just be treated like any API user of lucene and lucene
should not be limited by SOLR.

as for the original reason.. I support breaking out the analyzers and
making them more generic, or pushing down the changes SOLR (and nutch
and whoever)
have made back into the core.

as for the assertion that SOLR is the largest user of lucene, I don't
even know how you could back that up, and even if it is today, that
might change tomorrow.
The web is a fickle place.

so.. I'm pretty happy with how things are going today. lucene is a
library that other things can include. SOLR is a webservice using lucene.

On 2/28/10 5:57 AM, Michael McCandless wrote:
> To make this more concrete, I think this is roughly what's being
> proposed:
>    * Merging the dev lists into a single list.
>    * Merging committers.
>    * When a change it committed to Lucene, it must pass all Solr
>      tests.
>    * Release both at once.
> These things would not change:
>    * Most importantly, the source code would remain factored into
>      separate dirs/modules.
>    * User's lists should remain separate.
>    * Web sites would remain separate.
>    * Solr&  Lucene are still separate downloads, separate JARs,
>      seperate subdirs in the source tree, etc.
> The outside world still sees Solr&  Lucene as separate entities.  It's
> only that they would now be developed/released in synchrony.
> There are some important gains by doing this:
>    * Single source for all the code dup we now have across the
>      projects (my original reason, specifically on analyzers, for
>      starting this).
>    * Whenever a new feature is added to Lucene, we'd work through what
>      the impact is to Solr.  This can still mean we separately develop
>      exposure in Solr, but it'd get us to at least more immediately
>      think about it.
>    * Solr is Lucene's biggest direct user -- most people who use Lucene
>      use it through Solr -- so having it more closely integrated means
>      we know sooner if we broke something.
>    * Right now I could test whether flex breaks anything in Solr.  I
>      can't do that now since Solr is isn't upgraded to 3.1.
> Recent big changes (eg segment based searching, Version, attr based
> tokenstream api) caused alot of work in Solr that could've been much
> smoother had Solr "been there" as we were working through them.
> Recent new features, eg near-real-time search, which are unavailable
> in Solr still, would have at least had some discussion about how to
> expose this in Solr.
> Over time (and we don't have to do this right on day 1) we can make
> core capabilities available to pure Lucene.  EG core Lucene users
> should be able to use faceting, use a schema, etc.
> I think this idea makes alot of sense and I think now is a good time
> to do it.  Yes, this a big change, but I think the gains are sizable.
> As Lucene&  Solr diverge more, it'll only become harder and harder to
> merge.
> Robert's massive patch on SOLR-1657, upgrading most Solr's analyzers
> to 3.0, is aging... while other changes to analyzers are being
> proposed (SOLR-1799).  If we were integrated (or at least single
> source for analyzers), Robert would already have committed it.
> Mike
> On Fri, Feb 26, 2010 at 5:20 PM, Yonik Seeley
> <>  wrote:
>> On Fri, Feb 26, 2010 at 5:15 PM, Steven A Rowe<>  wrote:
>>> On 02/24/2010 at 2:20 PM, Yonik Seeley wrote:
>>>> I've started to think that a merge of Solr and Lucene would be in the
>>>> best interest of both projects.
>>> The Sorlucene :) merger could be achieved virtually, i.e. via policy, rather
than physically merging:
>> Everything is virtual here anyway :-)
>> I agree with Mike that a single dev list is highly desirable.  There
>> would still be separate downloads.  What to do with some of the other
>> stuff is unspecified.
>> Committers would need to be merged though - that's the only way to
>> make a change across projects w/o breaking stuff.
>> -Yonik

Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message