lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Modularization
Date Thu, 09 Apr 2009 22:25:36 GMT

: Then during build we can package up certain combinations.  I think
: there should be sub-kitchen-sink jars by area, eg a jar that contains
: all analyzers/tokenstreams/filters, all queries/filters, etc.

Or just make it trivial to get all jars that fit a given profile w/o 
actually merging those jars into an uber-jar ... does maven's 
dependency management have any like "bundles" or "virtual packages" so 
we could publish a "lucene-all-analzers" POM that didn't have an actual 
lucene-all-analyzers.jar but listed dependencies on all of the individual 

(FYI: Perl's CPAN has the concept of a "Bundle" that's just an empty 
distribution that depends on other distributions so you have an single 
refrence point for installing them)

: So, how would you refactor the various sources of
: analyzers/tokenstream/tokenfilters we have today
: (src/java/org/apache/lucene/analysis/*, contrib/snowball/*,
: contrib/collation/* and contrib/analyzers/*)?  (Even contrib/memory
: has a neat PatternAnalyzer, that operates on a string using a regexp
: to get tokenns out, that only now am I just discovering).

I think ideally the existig contrib/analysis would be broken up by 
language -- even if that means only 2 or 3 classes per jar -- but i don't 
deal with multilingual stuff much so i don't have much of an opinoin ... 
perhaps the majority of our users that deal with non-english tend to deal 
with *lots* of langauges so having a single "multilingual-analysis" module 
would be suitable.

: We also need to think about how this impacts our back-compat policy.
: EG when are we allowed to split up modules into sub-modules, or merge
: them.

spliting a module should always be fair game as long as the new module(s) 
maintain the same back compat policy ... it's not a burden to ask people 
to start using 2 jars instead of 1 jar (especially if we're already going 
to have an easy way to bundle jars up into uber-jars)

in theory merging modules should require that the new module adopt the 
most restrictive back-compat policy of the previous modules.

: Assuming there's general consensus on this "break core into modules"
: approach, I think the next step is to take in inventory of all of
: Lucene's classes and roughly divide them into proposed modules, and
: iterate on that?  Hoss do you want to take a first stab at that?

Heh.  i'm not sure i could even answer the "want" question in the 
afirmative.  This is essentially a question of refactoring, and I think 
approaching this incrimentally would be the best strategy ... either by 
first finding some low hanging fruit in core that could be extracted int 
oa contrib easily (spans, query parser) or by restructuring the build 
system to put contribs and the demo on equal footing with core as 
"modules" and reasses as progress is made.

on a personal note: even if i wanted to lead this charge, i really can't 
right now ... folks may have noticed my involvement with lucene has been 
markedly lower in the last few months, i expect it to get even lower over 
the next 2 months before it will (hopefully) get higher. 


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message