lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject RE: Modularization (was: Re: New flexible query parser)
Date Sat, 21 Mar 2009 11:03:04 GMT

> Honestly, I would not mind much where the source code lives in svn, so
> long as a developer, upon downloading Lucene 2.9, can go to *one*
> place (javadocs) for Lucene's "queries & filters" and see
> {Int,Long}NumberRangeFilter in there.
> We are not there today: a developer must first realize there's a whole
> separate place to look for "other" queries (contrib/queries).  Then
> the developer browses that and likely becomes confused/misled by what
> TrieRangeQuery means (is it a letter trie?).

That is a problem. The contrib/queries is a typical example of a
contribution that is almost always used in third-party projects (Solr):
It is stable and does not depend on other thing like the core and is 1.4
compatible (at the moment). Other contributions have external dependencies
or need another java version than the core.
I would split both types of contributions and would give the stable and
only-on-core depending ones a higher ranking (like put them into the
top-level changes list). E.g. when we release 2.9, nobody will realize, that
there is a new TrieRangeFilter in contrib/queries, because it is not in the
top-level changes list. Or the new contrib/spatial should have a visibility.
> My goal here is Lucene's consumability -- when someone new says "hey I
> heard about this great search library called Lucene; let me go try it
> out" I want that first impression to be as solid as possible.  I think
> this is very important for growing Lucene's community.  This is why
> "out of the box" defaults are so crucial (eg changing IW from flushing
> every 10 docs to every 16 MB gained sizable throughput).
> How many times have we seen a review, article, blog post, etc.,
> comparing Lucene to other search libraries only to incorrectly
> complain because "Lucene can't do XYZ" or "Lucene's indexing
> performance is poor", etc, because they didn't dig in to learn all the
> tunings/options/tricks we all know you are supposed to do?  (It
> frustrates me to end when this happens).  This then hurts Lucene's
> adoption because others read such articles and conclude Lucene is a
> non-starter.

I know this problem. And about the contrib queries: Most developments that
use Lucene (e.g. Solr) use always some of the contrib jars. And almost
everytime contrib/queries. But starters like the journalists writing those
articles, only take the core and test something with it.

So splitting up the whole Lucene in different parts is better (so these
people must always think about all available packages and which they need
for their project):

> We all ought to be concerned with Lucene's adoption & growth with time
> (I am), and first-impression consumability / out of the box defaults
> are big drivers of that.
> What if (maybe for 3.0, since we can mix in 1.5 sources at that
> point?) we change how Lucene is bundled, such that core queries and
> contrib/query/* are in one JAR (lucene-query-3.0.jar)?  And
> lucene-analyzers-3.0.jar would include contrib/analyzers/* and
> org/apache/lucene/analysis/*.  And lucene-queryparser.jar, etc.

This is even better! +1

I would propose:
- core: Indexer, Documents, IndexReader, Searcher and the default
directory-stores (fs, mmap, nio).
- queries: current core queries and contrib/queries
- queryparser (the new one? Or two different packages for old and new): this
should really be removed from core, a lot of people think, that they can
only query lucene using the queryparser and do not even try to build their
Boolean-queries manually and often fail, when it gets complicated, where the
query parser cannot help or fails, e.g. querying non-tokenized fields (but
this would depend on queries, we need that here)...
- analysis (and completely remove analyzers from core, let only be the
abstract analyzer stay there and keyword analyzer, if you want to index
without analyzer or do not need one because of only non-tokenized fields,...
- highlighting
- custom sorting separate????
- spatial
- ...

We then could change our contrib SVN accounts and have new roles like
(core-committer, queries-committer,...)


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message