Mailing-List: contact general-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: general@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of uboness@gmail.com designates
 74.125.78.24 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=message-id:date:from:user-agent:mime-version:to:subject
         :content-type;
        b=cvFMQGl6RgW7zieNlbmxn3iC1RDaG0G7k8fXfvDgyYxAx21F/GtlitAQ2wuH2b2ThA
         br+0xysLJj7xUYEsghm5O+gjIHUP+cyKRPofJGOcdTnPFbVY5ML0BDAC55E8mWXOlTg2
         TCWiwIdIC/9IZqbbEGfQbE5W8WTQVDQd3Vc2A=
Message-ID: <4B8D4D68.7010708@gmail.com>
Date: Tue, 02 Mar 2010 18:39:52 +0100
From: Uri Boness <uboness@gmail.com>
User-Agent: Thunderbird 2.0.0.23 (X11/20090817)
MIME-Version: 1.0
To: general@lucene.apache.org
Subject: Re: Factor out a standalone,
 shared analysis package for Nutch/Solr/Lucene?
Content-Type: multipart/alternative;
 boundary="------------090801050504020209010707"

--------------090801050504020209010707
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Hi,

Just found out about this discussion so I realize I'm stepping in rather 
late with my feedback... still for what it's worth, here it is :-).

In general I'm against this proposal as I believe it's can cause more 
harm than good. The way I (and many others) see Lucene is as a separate 
effort than Solr. I'm *big* fan of Solr and (as some of you may know) 
I'm using it daily and promoting it where/when I can. That said, I'm 
also a big fan of Lucene and I believe Solr has its value and use cases 
while Lucene has its own.

Joining Solr with Lucene has the potential of creating a "virtual" 
monopoly over Solr-like solutions built on top of Lucene which is not 
community friendly but more importantly it puts the competition for Solr 
in jeopardy. IMO competition is a key advantage for products/projects. 
Yes, there is competition that will always come from the commercial 
vendors, but competition and challenges must also come from the open 
source community. This a big part of what drives innovation. 
Furthermore, the community and the users of Lucene should have the 
power/ability to decide on which solutions they want to go for - this is 
true community driven development way.

I fully agree that there are many duplication in the work that is 
currently being done in Solr. But it mainly originates in Solr not in 
Lucene and the Lucene community should not be bothered by that. Such 
duplicate work should be addressed in the Solr project. So for example, 
take the analysis code... if all the work that has gone into the 
analyzers in Solr would have been committed in Lucene from the start, 
there wouldn't have been duplications. Same goes for the spatial support 
or other duplicate work. Solr development certainly proven to push 
Lucene development in many ways, and the best way to handle it is to 
contribute back all this goodness to Lucene. And yes, it means that Solr 
releases will need to wait for official Lucene releases, or in the mean 
time have their own custom Lucene distributions, but this is the fair 
play that all Lucene based solutions (let it be Katta, ElasticSearch, 
Sensei, or any other) will have to deal with.

>  Merging committers.
I believe this will create a proliferation of commiters on these 
projects which can bring a lot of mess. Let Lucene commiters focus on 
what they do and know best - which is Lucene, and let Solr committer 
focus on Solr. If a Solr committer can bring a lot of value to Lucene, 
then yes, sure, make him/her a Lucene committers, but IMO being a Solr 
committer doesn't automatically give anyone the credentials or the 
skills to be a Lucene committer... mainly because the work done is Solr 
is often at a higher level and often not related to Lucene at all.
> Single source for all the code dup we now have across the
>     projects (my original reason, specifically on analyzers, for
>     starting this).
As mentioned above, this can easily be done by contributing the changes 
to the analyzers back to Lucene.

> Whenever a new feature is added to Lucene, we'd work through what
>     the impact is to Solr.  This can still mean we separately develop
>     exposure in Solr, but it'd get us to at least more immediately
>     think about it.
This is something that Solr committers need to be responsible for, not 
lucene commiters. Lucene committers need to make sure that Lucene works 
and is bug free. I don't think it makes sense to push Solr 
responsibilities on to Lucene committers.

> Solr is Lucene's biggest direct user -- most people who use Lucene
>     use it through Solr -- so having it more closely integrated means
>     we know sooner if we broke something.
>   
I disagree here. I believe Lucene still has larger install base than 
Solr. Think of Jackrabbit which uses Lucene directly and all the CMSs 
that use Jackrabbit. Think of frameworks like Compass and Hibernate 
Search (that use Lucene directly) which are used in a lot of JEE 
deployments around the world. And certainly there are a lot of large 
infrastructures that use Lucene directly as well (as in LinkedIn for 
example). Solr is great in what it does but it is certainly not 
everything when it comes to open source search or Lucene.

> Right now I could test whether flex breaks anything in Solr.  I
>     can't do that now since Solr is isn't upgraded to 3.1.
True, but again, this is an issue Solr committers will have to deal 
with. And yes, it means that Solr will almost always be one step behind 
Lucene, but that's how it works with every dependency on every library 
you use. If you want to test the flex stuff and it's currently being 
developed as a separate lucene branch, then you can create a separate 
Solr branch to see how it works and what future changes might need to go 
into Solr. Again, Lucene committers shouldn't bother with this problem 
and the development of Lucene shouldn't be effected due Solr related 
issues.

Also take into account the huge difference in the release cycles between 
the projects. Lucene has quite a steady release cycle (last year it was 
quite constant on a release every 3 months or so). Solr on the other 
hand, has longer release cycles that can span more than a year. A lot of 
the issues that stall Solr releases have nothing to do with Lucene and 
Lucene release cycle shouldn't suffer from that. Furthermore 
users/projects/products that use Lucene directly should not suffer from 
that as well. All the goodness that is developed in Lucene and all the 
bug fixes should be available to Lucene users to download as soon as 
they're ready - they don't need to suffer from any Solr related issues.

Please rest assure that my goal here is not to step on anyone's toes. 
I'm not a committer on either project but I certainly want to see these 
two projects go the right direction (at least the direction I believe is 
right). So just wanted to express my concerns here.

Keep up the good work!

Cheers,
Uri

--------------090801050504020209010707--