lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@googlemail.com>
Subject Re: modularization discussion
Date Sat, 07 May 2011 10:34:43 GMT
On Sat, May 7, 2011 at 12:30 PM, Michael McCandless
<lucene@mikemccandless.com> wrote:
> I agree: refactoring is TONS of work.  Even cases that seem cut and
> dry, from a distance, quickly prove to be hairy (just ask Robert about
> refactoring analyzers).
>
> However, I think "unproven gain" is too strong.  EG, just a few days
> ago we had a user thread asking how to use auto-suggest outside of
> Solr.  Once we commit the suggest module, this is easy/ier for that
> user, and now we have one more user testing things, finding bugs,
> maybe offering improvements, etc.  I think the gains of each
> refactoring are potentially large, but they are not immediate -- they
> accrue over time.  It's an investment.
>
> Also: I'm in no way asking/expecting other devs to sign up to do
> refactoring (your response seems to imply this).  Nobody can do such a
> thing.  We all scratch our own itches and I'm not asking you to
> scratch mine :)
>
> What I am asking is that if someone wants to scratch this itch (factor
> out XXX as a module), they are fully free to do so, as long as it
> doesn't harm Solr's/Lucene's current functions, performance, etc.  We
> don't seem to have this freedom today, and this is, I think, the core
> conflict.
>
> Grant if I'm reading your response right, you agree with that freedom
> (others are free to refactor); you're just tempering in a good dose of
> reality ("refactoring is hard"), which I agree with.

Mike thank you for this email - this is the consens we need to have!!!

+1 for this... I think this is also what the board report should
contain but I will reply to this separately.

simon
>
> Mike
>
> http://blog.mikemccandless.com
>
> On Thu, May 5, 2011 at 10:25 AM, Grant Ingersoll <gsingers@apache.org> wrote:
>>
>> On May 5, 2011, at 4:15 AM, Simon Willnauer wrote:
>>
>>> Hey folks
>>>
>>> On Tue, May 3, 2011 at 6:49 PM, Michael McCandless
>>> <lucene@mikemccandless.com> wrote:
>>>> Isn't our end goal here a bunch of well factored search modules?  Ie,
>>>> fast forward a year or two and I think we should have modules like
>>>> these:
>>>
>>> I think we have two camps here (10k feet view):
>>>
>>
>> I'd say 3 camps:
>>
>>> 1. wants to move towards modularization might support all the modules
>>> mike has listed below
>>> 2. wants to stick with Solr's current architecture and remain
>>> "monolithic" (not negative in this case) as much as possible
>>
>> 3.  Those who think most should be modularized, but realize it's a ton of work for
an unproven gain (although most admit it is a highly likely gain) and should be handled on
a case-by-case basis as people do the work.   I don't have anything against modularization,
I just know, given my schedule, I won't be able to block off weeks of time to do it.  I'm
happy to review where/when I can.
>>
>>
>>>
>>> I think we can meet somewhere in between and agree on certain module
>>> that should be available to lucene users as well. The ones I have in
>>> mind are
>>> primary search features like:
>>> - Faceting
>>
>> Yeah, for instance, Bobo seems to have some interesting faceting implementations
that are ASL, perhaps we can combine into this new faceting module.
>>
>>> - Highlighting
>>> - Suggest
>>> - Function Query (consolidation is needed here!)
>>> - Analyzer factories
>>
>> +1.
>>
>>>
>>> things like distribution and replication should remain in solr IMO but
>>> might be moved to a more extensible API so that people can add their
>>> own implementation.
>>
>> And, of course, all the web tier stuff (response writers, inputs, etc.)
>>
>>> I am thinking about things like the ZooKeeper
>>> support that might not be a good solution for everybody where folks
>>> have already JGroups infrastructure.
>>
>> Or other similar solutions.  I wonder about using a ZeroConf implementation that
can do self-discovery.
>>
>>> So I think we can work towards 2
>>> distinct goals.
>>> 1. extract common search features into modules
>>> 2. refactor solr to be more "elastic" / "distributed"  and extensible
>>> with respect to those goals.
>>
>> 3. Make it easier for Solr to be programmatically configured by decoupling the reading
of schema.xml and solrconfig.xml from the code that actually contains the structures for the
properties (IndexSchema and SolrConfig)
>>
>>>
>>> maybe we can get agreement on such a basis though.
>>>
>>> let me know what you think
>>
>> I think it's reasonable.  At the end of the day, it broadens the appeal of both
Lucene and Solr.  Solr still exists and is not just a "shell" and at the end of the day,
remains the primary choice for people who don't want to stitch everything together themselves.
 All of it is easier to contribute to b/c people can focus in on the core area they know
w/o having to know everything else per se.  Stuff should be better tested b/c of it as well
since it will receive broader use.
>>
>> That being said, and not to be discouraging, but I see it as a ton of work.
>>
>>
>>
>>
>>>
>>> simon
>>>>
>>>>  * Faceting
>>>>
>>>>  * Highlighting
>>>>
>>>>  * Suggest (good patch is on LUCENE-2995)
>>>>
>>>>  * Schema
>>>>
>>>>  * Query impls
>>>>
>>>>  * Query parsers
>>>>
>>>>  * Analyzers (good progress here already, thanks Robert!),
>>>>    incl. factories/XML configuration (still need this)
>>>>
>>>>  * Database import (DIH)
>>>>
>>>>  * Web app
>>>>
>>>>  * Distribution/replication
>>>>
>>>>  * Doc set representations
>>>>
>>>>  * Collapse/grouping
>>>>
>>>>  * Caches
>>>>
>>>>  * Similarity/scoring impls (BM25, etc.)
>>>>
>>>>  * Codecs
>>>>
>>>>  * Joins
>>>>
>>>>  * Lucene core
>>>>
>>>> In this future, much of this code came from what is now Solr and
>>>> Lucene, but we should freely and aggressively poach from other
>>>> projects when appropriate (and license/provenance is OK).
>>>>
>>>> I keep seeing all these cool "compressed int set" projects popping
>>>> up... surely these are useful for us.  Solr poached a doc set impl
>>>> from Nutch; probably there's other stuff to poach from Nutch, Mahout,
>>>> etc.
>>>>
>>>> Katta's doing something sweet with distribution/replication; let's
>>>> poach & merge w/ Solr's approach.  There are various facet impls out
>>>> there (Bobo browse/Zoie; Toke's; Elastic Search); let's poach & merge
>>>> with Solr's.
>>>>
>>>> Elastic Search has lots of cool stuff, too, under ASL2.
>>>>
>>>> All these external open-source projects are fair game for poaching and
>>>> refactoring into shared modules, along with what is now Solr and
>>>> Lucene sources.
>>>>
>>>> In this ideal future, Solr becomes the bundling and default/example
>>>> configuration of the Web App and other modules, much like how the
>>>> various Linux distros bundle different stuff together around the Linux
>>>> kernel.  And if you are an advanced app and don't need the webapp
>>>> part, you can cherry pick the huper duper modules you do need and
>>>> directly embedded into your app.
>>>>
>>>> Isn't this the future we are working towards?
>>>>
>>>> Mike
>>>>
>>>> http://blog.mikemccandless.com
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>
>> --------------------------
>> Grant Ingersoll
>> Lucene Revolution -- Lucene and Solr User Conference
>> May 25-26 in San Francisco
>> www.lucenerevolution.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message