lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: deprecating Versions
Date Wed, 01 Dec 2010 01:10:45 GMT
On Mon, Nov 29, 2010 at 05:34:27AM -0500, Robert Muir wrote:
> Is it somehow possible i could convince everyone that all the analyzers we
> provide are simply examples?  This way we could really make this a bit more
> reasonable and clean up a lot of stuff.

I understand what you're getting at.  We don't really expect people to fork an
analyzer code base, though -- so we need to draw a line between e.g. the code
that implements StopFilter and stoplist content.   We want the low-level code
to be part of the library, but maybe we want stoplist content to be considered
example code.

> Seems like we really want to move towards a more declarative model where
> these are just config files... so only then it will ok for us to change them
> because they suddenly aren't suffixed with .java?!

Consider how this might work with e.g. RussianAnalyzer.  The
declaratively-expressed sample analyzer config could contain a hard-coded list
of Russian stop words, and as this hard-coded stoplist would travel with the
index in a config file, there would be no index compatibility problems upon
upgrading Lucene.  The stoplist in the sample config could change, even on
bugfix releases.

Config file syntax would potentially be affected by a Lucene upgrade, but that
doesn't affect index content and maintaining back compat is straightforward.

Things are more difficult with versioning e.g. stemmers, but I think the
stoplist example illustrates the potential of declarative analyzer
specification.  Maybe specifying Version in a sample file and dispatching to
different revs of a Snowball stemmer is less painful than forcing a user to
figure out Version from API documentation?

Having to extract an Analyzer from an index directory does present the
potential for Analyzer mismatches in a multi-node setup where e.g. the machine
that parses the query string and the machine which executes matching are not
the same.

Marvin Humphrey

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message