lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan McKinley <ryan...@gmail.com>
Subject Re: [VOTE] merge lucene/solr development
Date Tue, 09 Mar 2010 07:16:21 GMT
I'm still trying to grok the different points of view and apparent
(mis?) perceptions on what everyone is saying.

Going back to the beginning, the basic problem is that code is
duplicated between solr and lucene and fixing that is difficult with
the current structure.

There is no intention "merge" solr and lucene - just figure out a way
to make their development smoother.
 - Lucene is (and will be) the core full text search library and a
bunch of optional packages to support various things like analysis,
query parsing, etc, etc
 - Solr is a full featured search server -- it strings together many
of the lucene libraries with a simplified API (http)

This proposal would make it easier to keep the general stuff (like
numerics, spatial, analysis) out of solr and into optional lucene
packages.  It would not change the nature of either project.


On Mon, Mar 8, 2010 at 11:59 PM, Dennis Kubes <kubes@apache.org> wrote:
> I read the previous discussions on general (although I missed the original
> email by Yonik which I have since read) and I think all of this discussion
> should be happening there, so I copied general to this response. But since
> this vote is occurring on private I thought it most appropriate to respond
> where it occurred.  Wouldn't you agree?
>
> I believe this is a question of identity.  What is Lucene?
>
> IMO Lucene is a full text search library, that is it's purpose.  It isn't
> trying to be a search server or a search engine.  It is easy to include as a
> library and is used on everything from embedded servers to www search
> engines.

absolutely agree.  This proposal would help both projects focus on what they do

>
> Quoting from Yonik's previous posting:
>
>> Some in Lucene development have expressed a desire to make Lucene more
>> of a complete solution, rather than just a core full-text search
>> library... things like a data schema, faceting, etc.  The Lucene
>> project already has an enterprise search platform with these
>> features... that's Solr.
>
> So is Lucene a full text search library or is it something different? And
> isn't that something different already Solr?  Why should they be the same
> thing when their goals aren't the same?

they would not be the same thing.  They would be different packages
(as they are now).  It would just be easier to manage the development
of many optional search features.

>
>> Trying to pull popular pieces out of Solr
>> makes life harder for Solr developers, brings our projects into
>> conflict, and is often unsuccessful (witness the largely failed
>> migration of FunctionQueries from Solr to Lucene).
>
> I feel for you, really.  I remember trying to develop in Nutch on Hadoop
> 0.04.  But the logic is not correct.  Just because Solr wants X feature and
> Solr uses Lucene != everyone who uses Lucene wants X.  Faceting for example,
> great feature, but not useful in every full text search.
>

The real problem is that both solr and lucene have their own versions
of the same thing.  There is no intention to add *every* feature to
the core, rather make sure that development can be focused somewhere.
Right now there are two versions of function queries, numerics and
spatial... uggg.


>> For Lucene to achieve the ultimate in usability for users, it can't
>> require Java experience... it needs higher level abstractions provided
>> by Solr.
>
> I don't believe this to be true.  If the Lucene community had wanted very
> general language agnostic search, it would have happened by now. Lucene is a
> Java API.  Solr on the other hand is a server and therefore should be
> language agnostic.
>

agree

>> The other benefit to Lucene would be to bring features to developers
>> much sooner... Solr has had features years before they were developed
>> in Lucene, and currently has more developers working with it.
>
> "We have more developers than you do" isn't a valid reason to merge,
> especially in open source software.  Maybe in the corporate world.  IMO if
> Solr has more developers and want some architecture changed in Lucene and it
> is to the benefit of the entire Lucene community, then those changes can be
> proposed and voted upon.
>
>> Esp with Solr not using Lucene trunk, if a Solr developer wants a
>> feature quickly, they cannot add it to Lucene (even if it might make
>> sense there) since that introduces a big unpredictable lag
>
> Solr has the option of not using Lucene.  If something needs to go into
> Lucene, it should be voted on and support all of the different uses for
> Lucene.  As a friend told me recently, specialization is for insects.

I agree in theory...  but in practice, it means that stuff that
conceptually should live in lucene gets added to solr and later
duplicated in lucene

>
>> 1) Solr would go back to using Lucene's trunk
>
> Use trunk, don't use trunk.  That is up to the Solr project.  It shouldn't
> influence Lucene's behavior.
>
>> 2) For new Solr features, there would be an effort to abstract it such
>> that non-Solr users could use the functionality (faceting, field
>> collapsing, etc)
>
> Can you say that every feature would be applicable to a full text search
> library.  If not then it is beyond the core responsibilities of Lucene.
>

That would be the lipmus test to know if it should be in the lucene
distribution vs solr.

But yes, I think collapsing and faceting (sans-solr) belong in lucene.


>> 3) For new Lucene features, there would be an effort to integrate it
>> into Solr.
>
> No.  Because by specializing towards Solr, or Nutch, or any of the hundred
> other applications that use Lucene, it looses its general applicability.
>  Where would Hadoop be if it never made it past Nutch?

Not sure I follow...  I think this means solr would aim to use the new
lucene features as they are developed.  For example, using the
reusable token streams from the get-go rather then get stuck in 6
months of stale patches that don't apply.


>
>> 4) Releases would be synchronized... Lucene and Solr would release at
>> the same time.
>
> So synchronize your releases.  Communicate.
>
> I am open to listening to your responses, but all of this is to say my vote
> is still currently -1.
>


ryan

Mime
View raw message