Return-Path: Delivered-To: apmail-lucene-general-archive@www.apache.org Received: (qmail 46051 invoked from network); 1 Mar 2010 18:07:53 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 1 Mar 2010 18:07:53 -0000 Received: (qmail 59322 invoked by uid 500); 1 Mar 2010 18:07:51 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 59296 invoked by uid 500); 1 Mar 2010 18:07:51 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 59288 invoked by uid 99); 1 Mar 2010 18:07:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Mar 2010 18:07:51 +0000 X-ASF-Spam-Status: No, hits=-1.8 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [128.149.139.105] (HELO mail.jpl.nasa.gov) (128.149.139.105) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Mar 2010 18:07:39 +0000 Received: from mail.jpl.nasa.gov (altvirehtstap02.jpl.nasa.gov [128.149.137.73]) by smtp.jpl.nasa.gov (Switch-3.4.2/Switch-3.4.1) with ESMTP id o21I7Fb2021278 (using TLSv1/SSLv3 with cipher RC4-MD5 (128 bits) verified FAIL) for ; Mon, 1 Mar 2010 10:07:15 -0800 Received: from ALTPHYEMBEVSP20.RES.AD.JPL ([172.16.0.21]) by ALTVIREHTSTAP02.RES.AD.JPL ([128.149.137.73]) with mapi; Mon, 1 Mar 2010 10:07:15 -0800 From: "Mattmann, Chris A (388J)" To: "general@lucene.apache.org" Date: Mon, 1 Mar 2010 10:07:12 -0800 Subject: Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene? Thread-Topic: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene? Thread-Index: Acq5Zt17Bg0Ov3uWTDKur7CnnFHfMQAAyaf6 Message-ID: In-Reply-To: <9ac0c6aa1003010944pda8abe7o9112c470f723b470@mail.gmail.com> Accept-Language: en-US Content-Language: en X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_C7B15060D31AChrisAMattmannjplnasagov_" MIME-Version: 1.0 X-Source-IP: altvirehtstap02.jpl.nasa.gov [128.149.137.73] X-Source-Sender: chris.a.mattmann@jpl.nasa.gov X-AUTH: Authorized X-Virus-Checked: Checked by ClamAV on apache.org --_000_C7B15060D31AChrisAMattmannjplnasagov_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi Mike, I'm not sure I follow this line of thinking: how would Solr being a TLP aff= ect the creation of a separate project/module for Analyzers any more so tha= n it not being a TLP? Both Lucene-java and Solr (as a TLP) could depend on = the newly created refactored Analysis project. Chris On 3/1/10 10:44 AM, "Michael McCandless" wrote: If we don't somehow first address the code duplication across the 2 projects, making Solr a TLP will make things worse. I started here with analysis because I think that's the biggest pain point: it seemed like an obvious first step to fixing the code duplication and thus the most likely to reach some consensus. And it's also very timely: Robert is right now making all kinds of great fixes to our collective analyzers (in between bouts of fuzzy DFA debugging). But it goes beyond analyzers: I'd like to see other modules, now in Solr, eventually moved to Lucene, because they really are "core" functionality (eg facets, function (and other?) queries, spatial, maybe improvements to spellchecker/highlighter). How can we do this? And how can we do this so that it "lasts" over time? If new cool "core" things are born in Solr-land (which of course happens alot -- lots of good healthy usage), how will they find their way back to Lucene? Yonik's proposal (merging development of Solr/Lucene, but keeping all else separate) would achieve this. If we do the opposite (Solr -> TLP), how could we possibly achieve this? I guess one possibility is to just suck it up and duplicate the code. Meaning, each project will have to manually merge fixes in from the other project (so long as there's someone around with the itch to do so). Lucene would copy in all of Solr's analysis, and vice-versa (and likewise other dup'd functionality). I really dislike this solution... it will confuse the daylights out of users, its error proned, it's a waste of dev effort, there will always be little differences... but maybe it is in fact the lesser evil? I would much prefer merging Solr/Lucene development... Mike On Mon, Mar 1, 2010 at 12:01 PM, Mattmann, Chris A (388J) wrote: > Hi Grant, > >> On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote: >> >>> Hi Robert, >>> >>> I think my proposal (Solr->TLP) is sort of orthogonal to the whole anal= yzers >>> issue - I was in favor, at the very least, of having a separate >>> module/project/whatever that both Solr/Lucene (and whatever project) ca= n >>> depend on for the shared analyzer code... >> >> Not really. They are intimately linked. > > Ummm, how so? Making project A called "Apache Super Analyzers" and then > making Lucene(-java) and Solr depend on Apache Super Analyzers is separat= e > of whether or not Lucene(-java) and Solr are TLPs or not... > > Cheers, > Chris > > >> >> >>> >>> Cheers, >>> Chris >>> >>> >>> >>> On 3/1/10 9:12 AM, "Robert Muir" wrote: >>> >>> this will make the analyzers duplication problem even worse >>> >>> On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) < >>> chris.a.mattmann@jpl.nasa.gov> wrote: >>> >>>> Hi Mark, >>>> >>>> Thanks for your message. I respect your viewpoint, but I respectfully >>>> disagree. It just seems (to me at least based on the discussion) like = a TLP >>>> for Solr is the way to go. >>>> >>>> Cheers, >>>> Chris >>>> >>>> >>>> >>>> On 3/1/10 8:54 AM, "Mark Miller" wrote: >>>> >>>> On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote: >>>>> Hi Mark, >>>>> >>>>> >>>>>> That would really be no real world change from how things work today= . >>>> The fact >>>>>> is, today, Solr already operates essentially as an independent proje= ct. >>>>>> >>>>> Well if that's the case, then it would lead me to think that it's mor= e of >>>> a >>>>> TLP more than anything else per best practices. >>>>> >>>> That depends. It could be argued it should be a top level project or >>>> that it should be closer to the Lucene project. Some people are arguin= g >>>> for both approaches right now. There are two directions we could move = in. >>>>> >>>>>> The only real difference is that it shares the same PMC with Lucene = now >>>> and >>>>>> wouldn't with this change. This would address none of the issues tha= t >>>>>> triggered >>>>>> the idea for a possible merge. >>>>>> >>>>> I don't agree -- you're looking to bring together two communities tha= t >>>> are >>>>> "fairly separate" as you put it. The separation likely didn't spring = up >>>> over >>>>> night and has been this way for a while (as least to my knowledge). T= his >>>> is >>>>> exactly the type of situation that typically leads to TLP creation fr= om >>>> what >>>>> I've seen. >>>>> >>>> It also causes negatives between Solr/Lucene that some are looking to >>>> address. Hence the birth of this proposal. Going TLP with Solr will on= ly >>>> aggravate those negatives, not help them. >>>> >>>> While the communities operate fairly separately at the moment, the >>>> people in the communities are not so separate. The committer list has >>>> huge overlap. Many committers on one project but not the other do a lo= t >>>> of work on both projects. >>>> >>>> There is already a strong link with the personal - merging the >>>> management of the projects addresses many of the concerns that have >>>> prompted this discussion. TLP'ing Solr only makes those concerns >>>> multiply. They would diverge further, and incompatible overlap between >>>> them would increase. >>>> >>>>> Cheers, >>>>> Chris >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> >>>>>> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote: >>>>>> >>>>>>> Hey Grant, >>>>>>> >>>>>>> I'd like to explore this< does this imply that the Lucene >>>> sub-projects will >>>>>>> go away and Lucene will turn into Lucene-java and maintain its Apac= he >>>> TLP, >>>>>>> and then you'd have say, solr.apache.org, tika.apache.org, >>>> mahout.apache.org >>>>>>> (already started), etc. etc.? If so, that may be the best of all >>>> worlds, >>>>>>> allowing project independence, but also not following the Apache >>>>>>> "antipattern" as Doug put it... >>>>>>> >>>>>>> Cheers, >>>>>>> Chris >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 3/1/10 7:28 AM, "Grant Ingersoll" wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Also, as Doug alluded to, the Board is likely to ask us to conside= r >>>> less >>>>>>>> subprojects in the future, so we may be consolidating and spinning= off >>>>>>>> anyway. >>>>>>>> >>>>>>>> >>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>>> Chris Mattmann, Ph.D. >>>>>>> Senior Computer Scientist >>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>>>>>> Office: 171-266B, Mailstop: 171-246 >>>>>>> Email: Chris.Mattmann@jpl.nasa.gov >>>>>>> Phone: +1 (818) 354-8810 >>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>>> Adjunct Assistant Professor, Computer Science Department >>>>>>> University of Southern California, Los Angeles, CA 90089 USA >>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> - Mark >>>>>> >>>>>> http://www.lucidimagination.com >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> Chris Mattmann, Ph.D. >>>>> Senior Computer Scientist >>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>>>> Office: 171-266B, Mailstop: 171-246 >>>>> Email: Chris.Mattmann@jpl.nasa.gov >>>>> WWW: http://sunset.usc.edu/~mattmann/ >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> Adjunct Assistant Professor, Computer Science Department >>>>> University of Southern California, Los Angeles, CA 90089 USA >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> - Mark >>>> >>>> http://www.lucidimagination.com >>>> >>>> >>>> >>>> >>>> >>>> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> Chris Mattmann, Ph.D. >>>> Senior Computer Scientist >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>>> Office: 171-266B, Mailstop: 171-246 >>>> Email: Chris.Mattmann@jpl.nasa.gov >>>> WWW: http://sunset.usc.edu/~mattmann/ >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> Adjunct Assistant Professor, Computer Science Department >>>> University of Southern California, Los Angeles, CA 90089 USA >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> >>>> >>> >>> >>> -- >>> Robert Muir >>> rcmuir@gmail.com >>> >>> >>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> Chris Mattmann, Ph.D. >>> Senior Computer Scientist >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>> Office: 171-266B, Mailstop: 171-246 >>> Email: Chris.Mattmann@jpl.nasa.gov >>> WWW: http://sunset.usc.edu/~mattmann/ >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> Adjunct Assistant Professor, Computer Science Department >>> University of Southern California, Los Angeles, CA 90089 USA >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >> >> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: Chris.Mattmann@jpl.nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: Chris.Mattmann@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ --_000_C7B15060D31AChrisAMattmannjplnasagov_--