lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?
Date Tue, 02 Mar 2010 07:17:13 GMT
Hey Mike,

> This looks great!

Thanks!

>
> But, the goal is to make a standalone toolkit exposing GIS functions,
> right?

Yep you got it!

>
> My original question (integrating this into Lucene/Solr) remains.

Sure, I think the goal would be to provide only the Spatial aspects required
by Search (e.g., filters for documents, field types, etc.) as small classes
in Lucene/Solr-land, and do the heavy lifting in the SIS project.

>
> EG there's alot of good working happening now in Solr to make spatial
> search available.  How will that find its way back to Lucene?  Lucene
> has its own (now duplicate) spatial package that was already
> developed.  Users will now be confused about the two, each have
> different bugs/features, etc.

I think as we move towards having an official SIS/spatial project and start
to have releases/libraries, etc., it could partially help, but not totally
alleviate, this issue.

Cheers,
Chris

>
> On Mon, Mar 1, 2010 at 1:28 PM, Mattmann, Chris A (388J)
> <chris.a.mattmann@jpl.nasa.gov> wrote:
>> I'm glad that you brought that up! :)
>>
>> Check out:
>>
>> http://incubator.apache.org/projects/sis.html
>>
>> We're just starting to tackle that very issue right
>> now...patches/ideas/contributions welcome.
>>
>> Cheers,
>> Chris
>>
>>
>>
>> On 3/1/10 11:25 AM, "Michael McCandless" <lucene@mikemccandless.com> wrote:
>>
>> Because the code dup with analyzers is only one of the problems to
>> solve.  In fact, it's the easiest of the problems to solve (that's why
>> I proposed it, only, first).
>>
>> A more differentiating example is a much less mature module....
>>
>> EG take spatial -- if Solr were its own TLP, how could spatial be
>> built out in a way that we don't waste effort, and so that both direct
>> Lucene and Solr users could use it when it's released?
>>
>> Mike
>>
>> On Mon, Mar 1, 2010 at 1:07 PM, Mattmann, Chris A (388J)
>> <chris.a.mattmann@jpl.nasa.gov> wrote:
>>> Hi Mike,
>>>
>>> I'm not sure I follow this line of thinking: how would Solr being a TLP
>>> affect the creation of a separate project/module for Analyzers any more so
>>> than it not being a TLP? Both Lucene-java and Solr (as a TLP) could depend
>>> on the newly created refactored Analysis project.
>>>
>>> Chris
>>>
>>>
>>>
>>> On 3/1/10 10:44 AM, "Michael McCandless" <lucene@mikemccandless.com> wrote:
>>>
>>> If we don't somehow first address the code duplication across the 2
>>> projects, making Solr a TLP will make things worse.
>>>
>>> I started here with analysis because I think that's the biggest pain
>>> point: it seemed like an obvious first step to fixing the code
>>> duplication and thus the most likely to reach some consensus.  And
>>> it's also very timely: Robert is right now making all kinds of great
>>> fixes to our collective analyzers (in between bouts of fuzzy DFA
>>> debugging).
>>>
>>> But it goes beyond analyzers: I'd like to see other modules, now in
>>> Solr, eventually moved to Lucene, because they really are "core"
>>> functionality (eg facets, function (and other?) queries, spatial,
>>> maybe improvements to spellchecker/highlighter).  How can we do this?
>>>
>>> And how can we do this so that it "lasts" over time?  If new cool
>>> "core" things are born in Solr-land (which of course happens alot --
>>> lots of good healthy usage), how will they find their way back to
>>> Lucene?
>>>
>>> Yonik's proposal (merging development of Solr/Lucene, but keeping all
>>> else separate) would achieve this.
>>>
>>> If we do the opposite (Solr -> TLP), how could we possibly achieve
>>> this?
>>>
>>> I guess one possibility is to just suck it up and duplicate the code.
>>> Meaning, each project will have to manually merge fixes in from the
>>> other project (so long as there's someone around with the itch to do
>>> so).  Lucene would copy in all of Solr's analysis, and vice-versa (and
>>> likewise other dup'd functionality).  I really dislike this
>>> solution... it will confuse the daylights out of users, its error
>>> proned, it's a waste of dev effort, there will always be little
>>> differences... but maybe it is in fact the lesser evil?
>>>
>>> I would much prefer merging Solr/Lucene development...
>>>
>>> Mike
>>>
>>> On Mon, Mar 1, 2010 at 12:01 PM, Mattmann, Chris A (388J)
>>> <chris.a.mattmann@jpl.nasa.gov> wrote:
>>>> Hi Grant,
>>>>
>>>>> On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote:
>>>>>
>>>>>> Hi Robert,
>>>>>>
>>>>>> I think my proposal (Solr->TLP) is sort of orthogonal to the whole
>>>>>> analyzers
>>>>>> issue - I was in favor, at the very least, of having a separate
>>>>>> module/project/whatever that both Solr/Lucene (and whatever project)
can
>>>>>> depend on for the shared analyzer code...
>>>>>
>>>>> Not really.  They are intimately linked.
>>>>
>>>> Ummm, how so? Making project A called "Apache Super Analyzers" and then
>>>> making Lucene(-java) and Solr depend on Apache Super Analyzers is separate
>>>> of whether or not Lucene(-java) and Solr are TLPs or not...
>>>>
>>>> Cheers,
>>>> Chris
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>> Chris
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 3/1/10 9:12 AM, "Robert Muir" <rcmuir@gmail.com> wrote:
>>>>>>
>>>>>> this will make the analyzers duplication problem even worse
>>>>>>
>>>>>> On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) <
>>>>>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>>>>>
>>>>>>> Hi Mark,
>>>>>>>
>>>>>>> Thanks for your message. I respect your viewpoint, but I respectfully
>>>>>>> disagree. It just seems (to me at least based on the discussion)
like a
>>>>>>> TLP
>>>>>>> for Solr is the way to go.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Chris
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 3/1/10 8:54 AM, "Mark Miller" <markrmiller@gmail.com>
wrote:
>>>>>>>
>>>>>>> On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote:
>>>>>>>> Hi Mark,
>>>>>>>>
>>>>>>>>
>>>>>>>>> That would really be no real world change from how things
work today.
>>>>>>> The fact
>>>>>>>>> is, today, Solr already operates essentially as an independent
>>>>>>>>> project.
>>>>>>>>>
>>>>>>>> Well if that's the case, then it would lead me to think that
it's more
>>>>>>>> of
>>>>>>> a
>>>>>>>> TLP more than anything else per best practices.
>>>>>>>>
>>>>>>> That depends. It could be argued it should be a top level project
or
>>>>>>> that it should be closer to the Lucene project. Some people are
arguing
>>>>>>> for both approaches right now. There are two directions we could
move
>>>>>>> in.
>>>>>>>>
>>>>>>>>> The only real difference is that it shares the same PMC
with Lucene
>>>>>>>>> now
>>>>>>> and
>>>>>>>>> wouldn't with this change. This would address none of
the issues that
>>>>>>>>> triggered
>>>>>>>>> the idea for a possible merge.
>>>>>>>>>
>>>>>>>> I don't agree -- you're looking to bring together two communities
that
>>>>>>> are
>>>>>>>> "fairly separate" as you put it. The separation likely didn't
spring up
>>>>>>> over
>>>>>>>> night and has been this way for a while (as least to my knowledge).
>>>>>>>> This
>>>>>>> is
>>>>>>>> exactly the type of situation that typically leads to TLP
creation from
>>>>>>> what
>>>>>>>> I've seen.
>>>>>>>>
>>>>>>> It also causes negatives between Solr/Lucene that some are looking
to
>>>>>>> address. Hence the birth of this proposal. Going TLP with Solr
will only
>>>>>>> aggravate those negatives, not help them.
>>>>>>>
>>>>>>> While the communities operate fairly separately at the moment,
the
>>>>>>> people in the communities are not so separate. The committer
list has
>>>>>>> huge overlap. Many committers on one project but not the other
do a lot
>>>>>>> of work on both projects.
>>>>>>>
>>>>>>> There is already a strong link with the personal - merging the
>>>>>>> management of the projects addresses many of the concerns that
have
>>>>>>> prompted this discussion. TLP'ing Solr only makes those concerns
>>>>>>> multiply. They would diverge further, and incompatible overlap
between
>>>>>>> them would increase.
>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Chris
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote:
>>>>>>>>>
>>>>>>>>>> Hey Grant,
>>>>>>>>>>
>>>>>>>>>> I'd like to explore this<   does this imply that
the Lucene
>>>>>>> sub-projects will
>>>>>>>>>> go away and Lucene will turn into Lucene-java and
maintain its Apache
>>>>>>> TLP,
>>>>>>>>>> and then you'd have say, solr.apache.org, tika.apache.org,
>>>>>>> mahout.apache.org
>>>>>>>>>> (already started), etc. etc.? If so, that may be
the best of all
>>>>>>> worlds,
>>>>>>>>>> allowing project independence, but also not following
the Apache
>>>>>>>>>> "antipattern" as Doug put it...
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Chris
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 3/1/10 7:28 AM, "Grant Ingersoll"<gsingers@apache.org>
  wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Also, as Doug alluded to, the Board is likely
to ask us to consider
>>>>>>> less
>>>>>>>>>>> subprojects in the future, so we may be consolidating
and spinning
>>>>>>>>>>> off
>>>>>>>>>>> anyway.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>> Chris Mattmann, Ph.D.
>>>>>>>>>> Senior Computer Scientist
>>>>>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109
USA
>>>>>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>>>>>>> Phone: +1 (818) 354-8810
>>>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>>>>>> University of Southern California, Los Angeles, CA
90089 USA
>>>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> - Mark
>>>>>>>>>
>>>>>>>>> http://www.lucidimagination.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>> Chris Mattmann, Ph.D.
>>>>>>>> Senior Computer Scientist
>>>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>>>>> WWW:
>>>>>>>> http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>>>> University of Southern California, Los Angeles, CA 90089
USA
>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> - Mark
>>>>>>>
>>>>>>> http://www.lucidimagination.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> Chris Mattmann, Ph.D.
>>>>>>> Senior Computer Scientist
>>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>>>> WWW:
>>>>>>> http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Robert Muir
>>>>>> rcmuir@gmail.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> Chris Mattmann, Ph.D.
>>>>>> Senior Computer Scientist
>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>>> WWW:   http://sunset.usc.edu/~mattmann/
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Chris Mattmann, Ph.D.
>>>> Senior Computer Scientist
>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 171-266B, Mailstop: 171-246
>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>> WWW:   http://sunset.usc.edu/~mattmann/
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Adjunct Assistant Professor, Computer Science Department
>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Senior Computer Scientist
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 171-266B, Mailstop: 171-246
>>> Email: Chris.Mattmann@jpl.nasa.gov
>>> WWW:   http://sunset.usc.edu/~mattmann/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Assistant Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>
>>
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: Chris.Mattmann@jpl.nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



Mime
View raw message