lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?
Date Mon, 01 Mar 2010 18:25:20 GMT
Because the code dup with analyzers is only one of the problems to
solve.  In fact, it's the easiest of the problems to solve (that's why
I proposed it, only, first).

A more differentiating example is a much less mature module....

EG take spatial -- if Solr were its own TLP, how could spatial be
built out in a way that we don't waste effort, and so that both direct
Lucene and Solr users could use it when it's released?

Mike

On Mon, Mar 1, 2010 at 1:07 PM, Mattmann, Chris A (388J)
<chris.a.mattmann@jpl.nasa.gov> wrote:
> Hi Mike,
>
> I'm not sure I follow this line of thinking: how would Solr being a TLP affect the creation
of a separate project/module for Analyzers any more so than it not being a TLP? Both Lucene-java
and Solr (as a TLP) could depend on the newly created refactored Analysis project.
>
> Chris
>
>
>
> On 3/1/10 10:44 AM, "Michael McCandless" <lucene@mikemccandless.com> wrote:
>
> If we don't somehow first address the code duplication across the 2
> projects, making Solr a TLP will make things worse.
>
> I started here with analysis because I think that's the biggest pain
> point: it seemed like an obvious first step to fixing the code
> duplication and thus the most likely to reach some consensus.  And
> it's also very timely: Robert is right now making all kinds of great
> fixes to our collective analyzers (in between bouts of fuzzy DFA
> debugging).
>
> But it goes beyond analyzers: I'd like to see other modules, now in
> Solr, eventually moved to Lucene, because they really are "core"
> functionality (eg facets, function (and other?) queries, spatial,
> maybe improvements to spellchecker/highlighter).  How can we do this?
>
> And how can we do this so that it "lasts" over time?  If new cool
> "core" things are born in Solr-land (which of course happens alot --
> lots of good healthy usage), how will they find their way back to
> Lucene?
>
> Yonik's proposal (merging development of Solr/Lucene, but keeping all
> else separate) would achieve this.
>
> If we do the opposite (Solr -> TLP), how could we possibly achieve
> this?
>
> I guess one possibility is to just suck it up and duplicate the code.
> Meaning, each project will have to manually merge fixes in from the
> other project (so long as there's someone around with the itch to do
> so).  Lucene would copy in all of Solr's analysis, and vice-versa (and
> likewise other dup'd functionality).  I really dislike this
> solution... it will confuse the daylights out of users, its error
> proned, it's a waste of dev effort, there will always be little
> differences... but maybe it is in fact the lesser evil?
>
> I would much prefer merging Solr/Lucene development...
>
> Mike
>
> On Mon, Mar 1, 2010 at 12:01 PM, Mattmann, Chris A (388J)
> <chris.a.mattmann@jpl.nasa.gov> wrote:
>> Hi Grant,
>>
>>> On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote:
>>>
>>>> Hi Robert,
>>>>
>>>> I think my proposal (Solr->TLP) is sort of orthogonal to the whole analyzers
>>>> issue - I was in favor, at the very least, of having a separate
>>>> module/project/whatever that both Solr/Lucene (and whatever project) can
>>>> depend on for the shared analyzer code...
>>>
>>> Not really.  They are intimately linked.
>>
>> Ummm, how so? Making project A called "Apache Super Analyzers" and then
>> making Lucene(-java) and Solr depend on Apache Super Analyzers is separate
>> of whether or not Lucene(-java) and Solr are TLPs or not...
>>
>> Cheers,
>> Chris
>>
>>
>>>
>>>
>>>>
>>>> Cheers,
>>>> Chris
>>>>
>>>>
>>>>
>>>> On 3/1/10 9:12 AM, "Robert Muir" <rcmuir@gmail.com> wrote:
>>>>
>>>> this will make the analyzers duplication problem even worse
>>>>
>>>> On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) <
>>>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>>>
>>>>> Hi Mark,
>>>>>
>>>>> Thanks for your message. I respect your viewpoint, but I respectfully
>>>>> disagree. It just seems (to me at least based on the discussion) like
a TLP
>>>>> for Solr is the way to go.
>>>>>
>>>>> Cheers,
>>>>> Chris
>>>>>
>>>>>
>>>>>
>>>>> On 3/1/10 8:54 AM, "Mark Miller" <markrmiller@gmail.com> wrote:
>>>>>
>>>>> On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote:
>>>>>> Hi Mark,
>>>>>>
>>>>>>
>>>>>>> That would really be no real world change from how things work
today.
>>>>> The fact
>>>>>>> is, today, Solr already operates essentially as an independent
project.
>>>>>>>
>>>>>> Well if that's the case, then it would lead me to think that it's
more of
>>>>> a
>>>>>> TLP more than anything else per best practices.
>>>>>>
>>>>> That depends. It could be argued it should be a top level project or
>>>>> that it should be closer to the Lucene project. Some people are arguing
>>>>> for both approaches right now. There are two directions we could move
in.
>>>>>>
>>>>>>> The only real difference is that it shares the same PMC with
Lucene now
>>>>> and
>>>>>>> wouldn't with this change. This would address none of the issues
that
>>>>>>> triggered
>>>>>>> the idea for a possible merge.
>>>>>>>
>>>>>> I don't agree -- you're looking to bring together two communities
that
>>>>> are
>>>>>> "fairly separate" as you put it. The separation likely didn't spring
up
>>>>> over
>>>>>> night and has been this way for a while (as least to my knowledge).
This
>>>>> is
>>>>>> exactly the type of situation that typically leads to TLP creation
from
>>>>> what
>>>>>> I've seen.
>>>>>>
>>>>> It also causes negatives between Solr/Lucene that some are looking to
>>>>> address. Hence the birth of this proposal. Going TLP with Solr will only
>>>>> aggravate those negatives, not help them.
>>>>>
>>>>> While the communities operate fairly separately at the moment, the
>>>>> people in the communities are not so separate. The committer list has
>>>>> huge overlap. Many committers on one project but not the other do a lot
>>>>> of work on both projects.
>>>>>
>>>>> There is already a strong link with the personal - merging the
>>>>> management of the projects addresses many of the concerns that have
>>>>> prompted this discussion. TLP'ing Solr only makes those concerns
>>>>> multiply. They would diverge further, and incompatible overlap between
>>>>> them would increase.
>>>>>
>>>>>> Cheers,
>>>>>> Chris
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote:
>>>>>>>
>>>>>>>> Hey Grant,
>>>>>>>>
>>>>>>>> I'd like to explore this<   does this imply that the
Lucene
>>>>> sub-projects will
>>>>>>>> go away and Lucene will turn into Lucene-java and maintain
its Apache
>>>>> TLP,
>>>>>>>> and then you'd have say, solr.apache.org, tika.apache.org,
>>>>> mahout.apache.org
>>>>>>>> (already started), etc. etc.? If so, that may be the best
of all
>>>>> worlds,
>>>>>>>> allowing project independence, but also not following the
Apache
>>>>>>>> "antipattern" as Doug put it...
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Chris
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/1/10 7:28 AM, "Grant Ingersoll"<gsingers@apache.org>
  wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Also, as Doug alluded to, the Board is likely to ask
us to consider
>>>>> less
>>>>>>>>> subprojects in the future, so we may be consolidating
and spinning off
>>>>>>>>> anyway.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>> Chris Mattmann, Ph.D.
>>>>>>>> Senior Computer Scientist
>>>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>>>>> Phone: +1 (818) 354-8810
>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>>>> University of Southern California, Los Angeles, CA 90089
USA
>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> - Mark
>>>>>>>
>>>>>>> http://www.lucidimagination.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> Chris Mattmann, Ph.D.
>>>>>> Senior Computer Scientist
>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>>> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> - Mark
>>>>>
>>>>> http://www.lucidimagination.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> Chris Mattmann, Ph.D.
>>>>> Senior Computer Scientist
>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>> Office: 171-266B, Mailstop: 171-246
>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Robert Muir
>>>> rcmuir@gmail.com
>>>>
>>>>
>>>>
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Chris Mattmann, Ph.D.
>>>> Senior Computer Scientist
>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 171-266B, Mailstop: 171-246
>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>> WWW:   http://sunset.usc.edu/~mattmann/
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Adjunct Assistant Professor, Computer Science Department
>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>
>>>
>>
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: Chris.Mattmann@jpl.nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: Chris.Mattmann@jpl.nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>

Mime
View raw message