lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Babak Farhang <>
Subject Re: Modularization
Date Tue, 31 Mar 2009 07:21:32 GMT
> maturity, and their back compat commitments.  The demo and getting
> started guies could also be expanded to refrence the contrib jars that
> contain code many people may want to reuse...

Here's an idea. Each contrib is really a project onto its own. And any
project, I suggest, ought to have its own demo program, together maybe
with a small write-up describing the idea behind the contrib and what
the demo does. So to get the ball rolling, how about adopting some
such documentation policy for *future* contribs as a
pseudo-requirement for making it into the official release?


PS this not a swipe at any upcoming contrib (TrieUtils: the
documentation there is really good :)

On Mon, Mar 30, 2009 at 5:31 PM, Chris Hostetter
<> wrote:
> After stiring things up, and then being off-list for ~10 days, I'm in an
> interesting position coming back to this thread and seeing the discussion
> *after* it essentially ended, with a lot of semi-concensus but no clear
> sense of hard and fast resolution or plan of action.
> FWIW, here are the notes i made based on reading the thread about the
> various sentiments i noticed expressed (wether i agree with them or
> not) in order to try and get a handle on what had been discussed.
> some of these were the optinion of a single person and i've paraphrased,
> others are my generalization of similar comments made by various
> people...
> - contrib has a bad rap
> - widely varying degrees of quality/stability in contrib code, hard to get
> people to rely on the "good" ones because of the "less good" ones
> - many people want a good, out of hte box, kitchen sink experience (ie:
> one monolithic jar containing all the "essentials")
> - need easy discoverability of all things of a given type (ie: all
> queries, all filters, all analyzers, etc...) .. ie: combined javadocs.
> - need easy installation of of all things of a given type (ie: a jar
> containing all types of queries, a jar containing all types of analyzers,
> etc...)
> - still need to deal with contribs that have external dependencies
> - still need to deal with contribs that require future versions of
> langauge (Java1.7 when core is still 1.5 compat)
> - users need better guidance about "why" something is a contrib
> (additional functionality, alternate functionality, example of use, tool,
> etc...)
> - while we should maintain/increase modularization, documentation should
> make features of contribs more promonent without stressing the isolation
> resulting from code modularization.
> - we should merge all contrib & core code into a unified src/ tree, and
> make the pacakging independent of the physical location in svn (ie: jars
> based on java package, not directory)
> While I'm mostly in favor of all of these sentiments, and think it's
> really just a question of how to go about it, the last one is actually
> something i've pretty stronly opposed to -- I think the best way forward
> is to have lots of small, well isolated source trees.
> code isolation (by directory hierarchy) is hte best way i've seen to
> ensure modularization, and protect against inadvertent dependency
> bleeding.  If we want to be able to produce small jars targeted at
> specific goals, and we want to be in foo.jar and
> to be in bar.jar then we shouldn't have
> src/java/o/l/a/foo/ and src/java/o/l/a/bar/ --
> doing so makes it way to easy for inadvertnent dependencies to crop up
> that make FooClass depend on bar class, and thus make it impossible to use
> foo.jar without also using bar.jar at runtime.
> it's certainly possible to have "all" source code in a single directory
> hierarchy, and then rely on the build system to ensure your don't
> inwarranted dependencies, but that requires you do express rules in the
> build system about what exactly the acceptible dependencies are, and it
> relies on everyone using the buildsystem correctly (missguided users of
> hand-holding IDEs could get very frustrated when the patches they submit
> violate rules of an overly complicated set of ant build files)
> FWIW: having lots/more of very small, isolated, hierarcies also wouldn't
> hinder any attempts at having kitchen-sink or "essential" jars --
> combining the classes from lots of little isolated code trees is a lot
> easier then extracting a few classes from one big code tree.
> One underlying assumption that seems to have permiated the existing
> discussion (without ever being explicitly stated) is the idea that most
> currently lives in src/java is the "core" and would be a single "module"
> ... personally i'd like to challege that assumption.  I'd like to suggest
> that besides obvious things that could be refactored out into other
> "modules" (span queries, queryparser) there are lots of additional ways
> that src/java could be sliced...
>  - interfaces and abstract clases and concrete classes for reading an
> index in one index-api.jar (ie: Directory but no FSDirectory; IndexReader
> but not MultiReader)
>  - ditto for creating/updating an index in one index-update.jar (ie:
> IndexWriter, TokenStream, Tokenizer, TokenFilter, Analyzer  but
> not any impls of the last 3)
>  - ditto for searching in index-search.jar (ie: Searcher, Searchable,
> HitCollector, Query ... but not any concrete subclasses
>  - simple-analysis.jar (SimpleAnalyzer, WhitespaceAnalyzer,
> LetterTokenizer, LowercaseFilter, etc...)
>  - english-analysis.jar (StandardAnalyzer, etc...)
>  - primative-queries.jar (TermQuery, BooleanQuery, MatchAllDocsQuery,
> MultiTermQuery, etc...)
>  - range-queries.jar (RangeQuery, RangeFilter, ConstantScoreRangeQuery)
>   ...etc...
> The crux of my point being that what we think of today as the lucene
> "core" is actually kind of big and bloated, and already has *a* kitchen
> sink thrown in -- it's just not neccessarily the kitchen sink many people
> want.
> a big percentage of our users may want highlighting by default, and may
> never care about function or span queries -- making it easier to get a
> monolithic jar of *everything* only addresses one of those three
> disconnects (easy access to the highlighting code) but splitting the
> current "core" up into lots of little pieces (aka: "modules") that have
> equal visibility to the existing contribs (now also "modules") would
> address all three disconnects: people wouldn't overlook modules they might
> want (like highlighting) because they are just as easy to find the "core"
> and people wouldn't wind up with bloated jars containing a lot of code
> they don't need. (beating a dead horse for a moment: this wouldn't
> proclude us from offering a bloated jar containing everything under the
> sun)
> Even without making radical changes to the way our source code is
> organized, a lot of improvements could be made by having better
> documentation ... could certainly
> have more info about what is included in a release, what types of things
> can be found in a contrib, etc...  Individual contrib README files should
> certianly get beefed up to describe their purpose, their level of
> maturity, and their back compat commitments.  The demo and getting
> started guies could also be expanded to refrence the contrib jars that
> contain code many people may want to reuse...
>   ...and that's all small improvements that could be made without
> radically changing anything about our source organization or packaging.
> splitting the core up into smaller modules would only help the situation,
> moving more things into the core seem like it would just make the problem
> worse.
> : I agree, but at least we need some clear criteria so the future
> : decision process is more straightforward.  Towards that... it seems
> : like there are good reasons why something should be put into contrib:
> I would agrue that is approaching the problem from the wrong direction.
> assume for the moment that we define the list of lucene "modules" as:
>   ls -d contrib/* src/java src/gcj src/demo src/jsp
> ...but in the future we want to split up some of hte bigger "modules" and
> move each module so they have equal visibility.
> i would suggest that the opperating assumption be that any new code
> contribution that adds functionality (ie: not a bug fix, or an
> enhancement to an existing Impl) belongs in a new "module" unless:
>  1) compilation constraints require that it be put in an existing module
> (ie: needs to introduce a bi-directional dependency with an existing
> class which can't be refactored out into the new module)
>  2) it is a natural conceptual fit with *all* of the existing classes in
> that module (ie: a new ThaiStemmerFilter could be added to an existing
> thai-analysis module)
> (but an equally important to the question of "when to add to an existing
> 'module' vs creating a new module?" should be the question of "when to
> split an exsting module?" ... something we've never really talked about
> for core or contribs.)
> : But I don't think "it doesn't have to be in core" (the "software
> : modularity" goal) is the right reason to put something in contrib.
> Would it sound like a better reason if we stoped calling "core" ... i look
> at it from the point of view of: Are classes A,B&C (which are tightly
> coupled) directly related to classes X,Y&Z (also tightly coupled) ?"
> ... if the answer is "no" then A,B&C do not belong in the same module as
> X,Y&Z ... it doesn't matter which module we're talking about (src/java,
> contrib/highlighter etc...)
> i don't think it makes any sense for the the TreiRangeQueries to be in the
> same "module" as IndexWriter, or IndexReader ... but i also don't think it
> makes sense for the trie to be in the same module as BoostingQuery or
> DuplicateFilter -- or for IndexWRiter to be in the same module as the
> existing query parser (or for hte existing query parser to be in the same
> module as the new one the IBM folks have been working on)
> we can have fine grained modularity w/o having second class citizens, and
> we can achieve it without needing to make radical changes -- but putting
> more stuff into "core" isn't going to help us get there.
> -Hoss
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message