From general-return-2105-apmail-lucene-general-archive=lucene.apache.org@lucene.apache.org Mon Mar 01 17:42:17 2010 Return-Path: Delivered-To: apmail-lucene-general-archive@www.apache.org Received: (qmail 36719 invoked from network); 1 Mar 2010 17:42:17 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 1 Mar 2010 17:42:17 -0000 Received: (qmail 9898 invoked by uid 500); 1 Mar 2010 17:42:15 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 9877 invoked by uid 500); 1 Mar 2010 17:42:15 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 9869 invoked by uid 99); 1 Mar 2010 17:42:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Mar 2010 17:42:15 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of simon.willnauer@googlemail.com designates 209.85.220.221 as permitted sender) Received: from [209.85.220.221] (HELO mail-fx0-f221.google.com) (209.85.220.221) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Mar 2010 17:42:04 +0000 Received: by fxm21 with SMTP id 21so205270fxm.5 for ; Mon, 01 Mar 2010 09:41:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:reply-to:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=gQSZzyi/i67Ll5cSMcsQos8pC8IxsVXR7fUdVVL+vF4=; b=smM9C2HxoOIg5w1pKLdoYw5HHXCLRCUcX2YpxnkO1YbSx0YzJffqDypF+hTt03jJce QMt0JJdmDk74EAbPa1jfgwf2onlHjPTkjaE3Hy21yLpD1fWsKr3wrwhwPdMSn5PJUVhP zdwmjttve7UJW7UAWs+CjqUlAjvQ2oM5JC31E= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:content-type:content-transfer-encoding; b=RHUjXliFGFVgVQNAvbC+MvbUUEw7gggUhOlDUvIO6fG5mbu+sh6LJfvm909sOLHPq4 Ux+dxObiZ8W27qALhbPZU3ZxYdmkGr2P584ZjEBkhs5uWfEbpZ4bueQmrUbVHtsyiIQQ w9kEEUByk4N7CuOcQncIIXDr5slz41vdPkV3M= MIME-Version: 1.0 Received: by 10.239.189.76 with SMTP id s12mr456709hbh.111.1267465303400; Mon, 01 Mar 2010 09:41:43 -0800 (PST) Reply-To: simon.willnauer@gmail.com In-Reply-To: <8f0ad1f31003010902m141c3f3am5b358ab50b7bec1e@mail.gmail.com> References: <17A2FFD8-B7D4-46AF-9748-39B06710AD19@apache.org> <8f0ad1f31003010902m141c3f3am5b358ab50b7bec1e@mail.gmail.com> Date: Mon, 1 Mar 2010 18:41:43 +0100 Message-ID: Subject: Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene? From: Simon Willnauer To: general@lucene.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org IMO the only downside is that we risk a longer release cycle if we merge. I requires a certain level of discipline but has this been the case since ever?! Anything else seems to be a win to both communities and I personally would love to see the communities coming closer again. I was working on many analyzers removing code duplication maintaining BW compat almost every time we committed a change caused a new issue on solr which could have been fixed in one go. Concerns about Solr could slow us down during maintaining BW compat appear to be invalid to me as the Solr API as a direct customer of the lucene API would enforce our policy which is a good thing. I also agree with Robert that moving Solr into a TLP would make things even worse. On Mon, Mar 1, 2010 at 6:02 PM, Robert Muir wrote: > but Yonik's proposal (or at least some of the ideas from it?) is attracti= ve > as it seems to solve the real problem that created the duplication in the > first place, which is not limited to analyzers. > > On Mon, Mar 1, 2010 at 12:01 PM, Mattmann, Chris A (388J) < > chris.a.mattmann@jpl.nasa.gov> wrote: > >> Hi Grant, >> >> > On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote: >> > >> >> Hi Robert, >> >> >> >> I think my proposal (Solr->TLP) is sort of orthogonal to the whole >> analyzers >> >> issue - I was in favor, at the very least, of having a separate >> >> module/project/whatever that both Solr/Lucene (and whatever project) = can >> >> depend on for the shared analyzer code... >> > >> > Not really. =C2=A0They are intimately linked. >> >> Ummm, how so? Making project A called "Apache Super Analyzers" and then >> making Lucene(-java) and Solr depend on Apache Super Analyzers is separa= te >> of whether or not Lucene(-java) and Solr are TLPs or not... >> >> Cheers, >> Chris >> >> >> > >> > >> >> >> >> Cheers, >> >> Chris >> >> >> >> >> >> >> >> On 3/1/10 9:12 AM, "Robert Muir" wrote: >> >> >> >> this will make the analyzers duplication problem even worse >> >> >> >> On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) < >> >> chris.a.mattmann@jpl.nasa.gov> wrote: >> >> >> >>> Hi Mark, >> >>> >> >>> Thanks for your message. I respect your viewpoint, but I respectfull= y >> >>> disagree. It just seems (to me at least based on the discussion) lik= e a >> TLP >> >>> for Solr is the way to go. >> >>> >> >>> Cheers, >> >>> Chris >> >>> >> >>> >> >>> >> >>> On 3/1/10 8:54 AM, "Mark Miller" wrote: >> >>> >> >>> On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote: >> >>>> Hi Mark, >> >>>> >> >>>> >> >>>>> That would really be no real world change from how things work tod= ay. >> >>> The fact >> >>>>> is, today, Solr already operates essentially as an independent >> project. >> >>>>> >> >>>> Well if that's the case, then it would lead me to think that it's m= ore >> of >> >>> a >> >>>> TLP more than anything else per best practices. >> >>>> >> >>> That depends. It could be argued it should be a top level project or >> >>> that it should be closer to the Lucene project. Some people are argu= ing >> >>> for both approaches right now. There are two directions we could mov= e >> in. >> >>>> >> >>>>> The only real difference is that it shares the same PMC with Lucen= e >> now >> >>> and >> >>>>> wouldn't with this change. This would address none of the issues t= hat >> >>>>> triggered >> >>>>> the idea for a possible merge. >> >>>>> >> >>>> I don't agree -- you're looking to bring together two communities t= hat >> >>> are >> >>>> "fairly separate" as you put it. The separation likely didn't sprin= g >> up >> >>> over >> >>>> night and has been this way for a while (as least to my knowledge). >> This >> >>> is >> >>>> exactly the type of situation that typically leads to TLP creation >> from >> >>> what >> >>>> I've seen. >> >>>> >> >>> It also causes negatives between Solr/Lucene that some are looking t= o >> >>> address. Hence the birth of this proposal. Going TLP with Solr will >> only >> >>> aggravate those negatives, not help them. >> >>> >> >>> While the communities operate fairly separately at the moment, the >> >>> people in the communities are not so separate. The committer list ha= s >> >>> huge overlap. Many committers on one project but not the other do a = lot >> >>> of work on both projects. >> >>> >> >>> There is already a strong link with the personal - merging the >> >>> management of the projects addresses many of the concerns that have >> >>> prompted this discussion. TLP'ing Solr only makes those concerns >> >>> multiply. They would diverge further, and incompatible overlap betwe= en >> >>> them would increase. >> >>> >> >>>> Cheers, >> >>>> Chris >> >>>> >> >>>> >> >>>> >> >>>> >> >>>>> >> >>>>> >> >>>>> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote: >> >>>>> >> >>>>>> Hey Grant, >> >>>>>> >> >>>>>> I'd like to explore this< =C2=A0 does this imply that the Lucene >> >>> sub-projects will >> >>>>>> go away and Lucene will turn into Lucene-java and maintain its >> Apache >> >>> TLP, >> >>>>>> and then you'd have say, solr.apache.org, tika.apache.org, >> >>> mahout.apache.org >> >>>>>> (already started), etc. etc.? If so, that may be the best of all >> >>> worlds, >> >>>>>> allowing project independence, but also not following the Apache >> >>>>>> "antipattern" as Doug put it... >> >>>>>> >> >>>>>> Cheers, >> >>>>>> Chris >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> On 3/1/10 7:28 AM, "Grant Ingersoll" =C2=A0 = wrote: >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>>> Also, as Doug alluded to, the Board is likely to ask us to consi= der >> >>> less >> >>>>>>> subprojects in the future, so we may be consolidating and spinni= ng >> off >> >>>>>>> anyway. >> >>>>>>> >> >>>>>>> >> >>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++= + >> >>>>>> Chris Mattmann, Ph.D. >> >>>>>> Senior Computer Scientist >> >>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >>>>>> Office: 171-266B, Mailstop: 171-246 >> >>>>>> Email: Chris.Mattmann@jpl.nasa.gov >> >>>>>> Phone: +1 (818) 354-8810 >> >>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++= + >> >>>>>> Adjunct Assistant Professor, Computer Science Department >> >>>>>> University of Southern California, Los Angeles, CA 90089 USA >> >>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++= + >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>> >> >>>>> -- >> >>>>> - Mark >> >>>>> >> >>>>> http://www.lucidimagination.com >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>> >> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>>> Chris Mattmann, Ph.D. >> >>>> Senior Computer Scientist >> >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >>>> Office: 171-266B, Mailstop: 171-246 >> >>>> Email: Chris.Mattmann@jpl.nasa.gov >> >>>> WWW: =C2=A0 http://sunset.usc.edu/~mattmann/ >> >> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>>> Adjunct Assistant Professor, Computer Science Department >> >>>> University of Southern California, Los Angeles, CA 90089 USA >> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>>> >> >>>> >> >>>> >> >>> >> >>> >> >>> -- >> >>> - Mark >> >>> >> >>> http://www.lucidimagination.com >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>> Chris Mattmann, Ph.D. >> >>> Senior Computer Scientist >> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >>> Office: 171-266B, Mailstop: 171-246 >> >>> Email: Chris.Mattmann@jpl.nasa.gov >> >>> WWW: =C2=A0 http://sunset.usc.edu/~mattmann/ >> >> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>> Adjunct Assistant Professor, Computer Science Department >> >>> University of Southern California, Los Angeles, CA 90089 USA >> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>> >> >>> >> >> >> >> >> >> -- >> >> Robert Muir >> >> rcmuir@gmail.com >> >> >> >> >> >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> Chris Mattmann, Ph.D. >> >> Senior Computer Scientist >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >> Office: 171-266B, Mailstop: 171-246 >> >> Email: Chris.Mattmann@jpl.nasa.gov >> >> WWW: =C2=A0 http://sunset.usc.edu/~mattmann/ >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> Adjunct Assistant Professor, Computer Science Department >> >> University of Southern California, Los Angeles, CA 90089 USA >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> > >> > >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Senior Computer Scientist >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 171-266B, Mailstop: 171-246 >> Email: Chris.Mattmann@jpl.nasa.gov >> WWW: =C2=A0 http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Assistant Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> > > > -- > Robert Muir > rcmuir@gmail.com >