lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <>
Subject [jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory
Date Tue, 19 Jan 2010 01:28:54 GMT


Mark Miller commented on SOLR-1677:

In my opinion this should be real simple. Having to specify a Lucene version for each component
is not simple - its beyond most users. I think its beyond me (laugh as you see fit). Having
to accept Lucene 2.4 behavior by default because of Solr back compat issues is also "weak".
A new user should get all the bug fixes of the latest Lucene with minimal effort. Hopefully
no effort. Older users should be able to get the newest with minimal effort as well - not
having to go one by one through each component and upgrading it. I can't imagine juggling
all these versions for each component - thats ugly enough in Lucene - it shouldn't infect
Solr for the average case.

Personally, I do think there should be a global default. And I think right next to it, it
should say, if you change this, you must reindex. No worries about action at a distance. The
action is to get the latest and greatest Lucene has to offer rather than older buggy or back
compat behavior. Reindex, get latest greatest. Don't reindex and your on your own. Solr might
rip your head off.

We should also offer per component for real experts, but I wouldn't be meddling that way myself
unless in a bind. Solr should be real simple about this - and the latest Solr should use the
latest bug fixes from Lucene, with previous configs out there defaulting to 2.4 compatibility.

I abbreviated the heck out of my arguments and thinking, but damn it thats what I think :)

> Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory
> -------------------------------------------------------------------------------------------
>                 Key: SOLR-1677
>                 URL:
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Uwe Schindler
>         Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch
> Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards compatibility
with old indexes created using older versions of Lucene. The most important example is StandardTokenizer,
which changed its behaviour with posIncr and incorrect host token types in 2.4 and also in
> In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with much more
Unicode support, almost every Tokenizer/TokenFilter needs this Version parameter. In 2.9,
the deprecated old ctors without Version take LUCENE_24 as default to mimic the old behaviour,
e.g. in StandardTokenizer.
> This patch adds basic support for the Lucene Version property to the base factories.
Subclasses then can use the luceneMatchVersion decoded enum (in 3.0) / Parameter (in 2.9)
for constructing Tokenstreams. The code currently contains a helper map to decode the version
strings, but in 3.0 is can be replaced by Version.valueOf(String), as the Version is a subclass
of Java5 enums. The default value is Version.LUCENE_24 (as this is the default for the no-version
ctors in Lucene).
> This patch also removes unneeded conversions to CharArraySet from StopFilterFactory (now
done by Lucene since 2.9). The generics are also fixed to match Lucene 3.0.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message