Return-Path: Delivered-To: apmail-lucene-general-archive@www.apache.org Received: (qmail 1749 invoked from network); 2 Mar 2010 17:40:33 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Mar 2010 17:40:33 -0000 Received: (qmail 1362 invoked by uid 500); 2 Mar 2010 17:40:28 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 1338 invoked by uid 500); 2 Mar 2010 17:40:28 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 1321 invoked by uid 99); 2 Mar 2010 17:40:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Mar 2010 17:40:28 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of uboness@gmail.com designates 74.125.78.24 as permitted sender) Received: from [74.125.78.24] (HELO ey-out-2122.google.com) (74.125.78.24) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Mar 2010 17:40:18 +0000 Received: by ey-out-2122.google.com with SMTP id 9so118419eyd.3 for ; Tue, 02 Mar 2010 09:39:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:content-type; bh=irSN0CM3nmK7DicumfdyQdl6vAw0goJekTynlmBj2IQ=; b=Qd8FkVDNzOTwD8rxQsrk7H9WIJKx6rowNwQmTFZTDjdeunoRxz2WCb34J4j9CaX+Aq jAff9zCUEBmbztXdrXXMbRnYze+vVFbkStEuI1idYuphXcM60szUZfknruk0Zn5yBpAl 3YDeDozxO1GEZtqVwXYGuvJmG27div8lHUzpo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject :content-type; b=cvFMQGl6RgW7zieNlbmxn3iC1RDaG0G7k8fXfvDgyYxAx21F/GtlitAQ2wuH2b2ThA br+0xysLJj7xUYEsghm5O+gjIHUP+cyKRPofJGOcdTnPFbVY5ML0BDAC55E8mWXOlTg2 TCWiwIdIC/9IZqbbEGfQbE5W8WTQVDQd3Vc2A= Received: by 10.213.102.139 with SMTP id g11mr4617948ebo.90.1267551597726; Tue, 02 Mar 2010 09:39:57 -0800 (PST) Received: from ?192.168.1.57? ([87.213.1.143]) by mx.google.com with ESMTPS id 15sm3209911ewy.0.2010.03.02.09.39.53 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 02 Mar 2010 09:39:54 -0800 (PST) Message-ID: <4B8D4D68.7010708@gmail.com> Date: Tue, 02 Mar 2010 18:39:52 +0100 From: Uri Boness User-Agent: Thunderbird 2.0.0.23 (X11/20090817) MIME-Version: 1.0 To: general@lucene.apache.org Subject: Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene? Content-Type: multipart/alternative; boundary="------------090801050504020209010707" X-Virus-Checked: Checked by ClamAV on apache.org --------------090801050504020209010707 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi, Just found out about this discussion so I realize I'm stepping in rather late with my feedback... still for what it's worth, here it is :-). In general I'm against this proposal as I believe it's can cause more harm than good. The way I (and many others) see Lucene is as a separate effort than Solr. I'm *big* fan of Solr and (as some of you may know) I'm using it daily and promoting it where/when I can. That said, I'm also a big fan of Lucene and I believe Solr has its value and use cases while Lucene has its own. Joining Solr with Lucene has the potential of creating a "virtual" monopoly over Solr-like solutions built on top of Lucene which is not community friendly but more importantly it puts the competition for Solr in jeopardy. IMO competition is a key advantage for products/projects. Yes, there is competition that will always come from the commercial vendors, but competition and challenges must also come from the open source community. This a big part of what drives innovation. Furthermore, the community and the users of Lucene should have the power/ability to decide on which solutions they want to go for - this is true community driven development way. I fully agree that there are many duplication in the work that is currently being done in Solr. But it mainly originates in Solr not in Lucene and the Lucene community should not be bothered by that. Such duplicate work should be addressed in the Solr project. So for example, take the analysis code... if all the work that has gone into the analyzers in Solr would have been committed in Lucene from the start, there wouldn't have been duplications. Same goes for the spatial support or other duplicate work. Solr development certainly proven to push Lucene development in many ways, and the best way to handle it is to contribute back all this goodness to Lucene. And yes, it means that Solr releases will need to wait for official Lucene releases, or in the mean time have their own custom Lucene distributions, but this is the fair play that all Lucene based solutions (let it be Katta, ElasticSearch, Sensei, or any other) will have to deal with. > Merging committers. I believe this will create a proliferation of commiters on these projects which can bring a lot of mess. Let Lucene commiters focus on what they do and know best - which is Lucene, and let Solr committer focus on Solr. If a Solr committer can bring a lot of value to Lucene, then yes, sure, make him/her a Lucene committers, but IMO being a Solr committer doesn't automatically give anyone the credentials or the skills to be a Lucene committer... mainly because the work done is Solr is often at a higher level and often not related to Lucene at all. > Single source for all the code dup we now have across the > projects (my original reason, specifically on analyzers, for > starting this). As mentioned above, this can easily be done by contributing the changes to the analyzers back to Lucene. > Whenever a new feature is added to Lucene, we'd work through what > the impact is to Solr. This can still mean we separately develop > exposure in Solr, but it'd get us to at least more immediately > think about it. This is something that Solr committers need to be responsible for, not lucene commiters. Lucene committers need to make sure that Lucene works and is bug free. I don't think it makes sense to push Solr responsibilities on to Lucene committers. > Solr is Lucene's biggest direct user -- most people who use Lucene > use it through Solr -- so having it more closely integrated means > we know sooner if we broke something. > I disagree here. I believe Lucene still has larger install base than Solr. Think of Jackrabbit which uses Lucene directly and all the CMSs that use Jackrabbit. Think of frameworks like Compass and Hibernate Search (that use Lucene directly) which are used in a lot of JEE deployments around the world. And certainly there are a lot of large infrastructures that use Lucene directly as well (as in LinkedIn for example). Solr is great in what it does but it is certainly not everything when it comes to open source search or Lucene. > Right now I could test whether flex breaks anything in Solr. I > can't do that now since Solr is isn't upgraded to 3.1. True, but again, this is an issue Solr committers will have to deal with. And yes, it means that Solr will almost always be one step behind Lucene, but that's how it works with every dependency on every library you use. If you want to test the flex stuff and it's currently being developed as a separate lucene branch, then you can create a separate Solr branch to see how it works and what future changes might need to go into Solr. Again, Lucene committers shouldn't bother with this problem and the development of Lucene shouldn't be effected due Solr related issues. Also take into account the huge difference in the release cycles between the projects. Lucene has quite a steady release cycle (last year it was quite constant on a release every 3 months or so). Solr on the other hand, has longer release cycles that can span more than a year. A lot of the issues that stall Solr releases have nothing to do with Lucene and Lucene release cycle shouldn't suffer from that. Furthermore users/projects/products that use Lucene directly should not suffer from that as well. All the goodness that is developed in Lucene and all the bug fixes should be available to Lucene users to download as soon as they're ready - they don't need to suffer from any Solr related issues. Please rest assure that my goal here is not to step on anyone's toes. I'm not a committer on either project but I certainly want to see these two projects go the right direction (at least the direction I believe is right). So just wanted to express my concerns here. Keep up the good work! Cheers, Uri --------------090801050504020209010707--