lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Modularization
Date Sat, 21 Mar 2009 12:36:28 GMT
> Maybe he actually ends up buying LIA(2) :)

LIA/2 suffers the same false dichotomy, and it drives me crazy there
too: we put all "contrib" packages in a different chapter, even though
it'd make much more sense to cover all analyzers in one chapter, all
queries in one chapter, etc.

I find myself cross-referencing over to TrieRangeQuery in Chapter 8,
from LIA's search chapter (Chapter 3), and it's awkward.

> So yeah I like this and 3.0 is a good opportunity to do this. I
> think a big part of this work should be good documentation. As you
> mentioned, Mike, it should be very simple to get an overview of what
> the different modules are.  So there should be the list of the
> different modules, together with a short description for each of
> them and infos about where to find them (which jar).  Then by
> clicking on e.g. queries, the user would see the list of all queries
> we support.

I agree: revamping the web-site for a better top-down introduction of
Lucene's features should be part of 3.0.

And I don't think the sudden separation of "core" vs "contrib" should
be so prominent (or even visible); it's really a detail of how we
manage source control.

When looking at the website I'd like read that Lucene can do hit
highlighting, powerful query parsing, spell checking, analyze
different languages, etc.  I could care less that some of these happen
to live under a "contrib" subdirectory somewhere in the source control

> But I think we should still have "main modules", such as core,
> queries, analyzers, ... and separately e.g. "sandbox modules?", for
> the things currently in contrib that are experimental or, as Mark
> called them, "graveyard contribs" :) ... even though we might then
> as well ask the questions if we can not really bury the latter
> ones...

Could we, instead, adopt some standard way (in the package javadocs)
of stating the maturity/activity/back compat policies/etc of a given

> Since we are just talking about packaging, why can't we have
> both/all of the above?  Individual jars, as well as one "big" jar,
> that contains everything (or, everything that has only dependencies
> we can ship, or "everything" that we deem important for an OOTB
> experience).  I, for one, find it annoying to have to go get
> snowball, analyzers, spellchecking and highlighting separate in most
> cases b/c I almost always use all of them and don't particularly
> care if there are extra classes in a JAR, but can appreciate the
> need to do that in specific instances where leaner versions are
> needed.  After all, the Ant magic to do all of this is pretty
> trivial given we just need to combine the various jars into a single
> jar (while keeping the indiv. ones)


So I think the beginnings of a rough proposal is taking shape, for 3.0:

  1. Fix web site to give a better intro to Lucene's features, without
     exposing core vs. contrib false (to the Lucene consumer)

  2. When releasing, we make a single JAR holding core & contrib
     classes for a given area.  The final JAR files don't contain a
     "core" vs "contrib" distinction.

  3. We create a "bundled" JAR that has the common packages
     "typically" needed (index/search core, analyzers, queries,
     highlighter, spellchecker)


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message