lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: [VOTE] merge lucene/solr development
Date Tue, 09 Mar 2010 14:07:54 GMT
Dropping private@.

On Mar 9, 2010, at 6:30 AM, mark harwood wrote:

> Another 2 cents late to the party.....
> 
>> I believe this is a question of identity.  What is Lucene?
> 
> Absolutely.
> I think one of the clearest differences in outlook between Lucene and Solr is in the
support for distributed deployments. Solr clearly aims to support distributed deployments
while Lucene is "just a library".
> Many index operations (faceting, search, top terms) that work in a distributed fashion
must be written differently to a single-index counterpart.
> If we do aim to share any distribute-capable functionality will Lucene need a brand new
set of abstractions to avoid binding directly to the Solr server platform? Is that at all
realistic?
> I speak as someone else who needs to maintain a Lucene extension similar to Solr, but
where using Solr is not the answer so am keen for Lucene to maintain independence.
> 
> Another potential big difference is any functionality that is Solr-schema-aware. Again,
would we need to introduce an abstraction for schemas?
> 
> Maybe it's useful to consider what is fundamentally different between Solr and Lucene
(I suggest schema vs no schema and distributed vs local) and use this to help put a limit
on what functionality we consider sharing.
> If a function is untainted by a fundamental difference (e.g. Analyzers typically couldnt
care less about schemas or distribution) then that is a candidate for sharing.
> 
> At the end of this process we get a good idea about what really can be shared.

I agree.  I maintain both Lucene and Solr instances.  Sometimes I need things that are in
Solr that are Lucene.  Sometimes I need things in Lucene that are in Solr.  In the Lucene
instances I maintain/help with, I don't need the Solr server stuff.  So, to me, there will
always need to be that distinction.  At the same time, it is very frustrating for me to write
code that I know belongs in Lucene, but that I put into Solr for the sole fact that I need
it for one of the Solr instances and simply can't afford to wait for Solr to be on the appropriate
version of trunk.  Likewise, I may want something for Lucene from Solr but it is a fair amount
of work to bring it up to the new Lucene APIs.

As for the sharing list, I started such a list on the other thread, but can duplicate here.

To me, there are at least the following:
1. Analyzers
2. Functions
3. Schema (although likely abstracted/reworked)
4. Warming/Reopen - this is hard code to get right and I've seen many people do it wrong.
 It is also yet another area of duplication where something started in Solr b/c for years
the Lucene community had no interest in donating code for it (incRef/decRef)
5. Faceting
6. Spatial

and on and on.  In fact, in my mind, it's pretty much everything other than stuff that is
explicitly to do with Input/Output (Request Handlers, Response Writers)  and HTTP as the server
mechanism.  Even with that list, though, I believe we can keep these separated enough that
people can pick and choose.  In fact, your input, Mark, would be valuable in helping maintain
that distinction.  As they say in the ASF, those who do, decide.

-Grant
Mime
View raw message