lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory
Date Tue, 05 Jan 2010 21:51:54 GMT


Robert Muir commented on SOLR-1677:

bq. Oh come on now ... that's not really a fair criticism of the example: there are plenty
of legitimate ways to use (some) TokenFilters only at search time and I specifically structured
my example to point out potential problems in cases just like that - Carl was very clear that
"if you used FooTokenFilterFactory in an index analyzer you'll need to reindex."

I disagree, Version applies to all of lucene (even more than tokenstreams), so for Carl to
imply that you don't need to reindex by bumping Version simply because you aren't using X
or Y or Z, for that he should be renamed Oscar.

bq. You could now argue that User Dwight is an idiot because he didn't warn Bob that other
Analyzers/Tokenizers/TokenFilters might be affected. But that just leads us to scenerious
that re-iterates my point that this type of global value is something that would be dangerous
to ever change....

Yeah, I guess I don't think he is an idiot. I just think he is a moron for suggesting such
a thing without warning of the consequences.

bq. Personally I never change the value of <luceneAnalyzerVersionDefault/> once i have
an existing schema.xml file. Instead i suggest you add luceneVersion="3.2" to your <filter
class="solr.FooTokenFilterFactory /> declaration so that you know you are only changing
the behavior you want to change.

Good for Ernest, i guess he is probably using Windows 3.1 still too because he doesn't want
to upgrade ever. Unless Ernest carefully reads Lucene CHANGES also and reads all the Solr
source code and knows which solr features are tied to which lucene features, because its not
obvious at all: i.e. solr's snowball factory doesn't use lucene's snowball, etc etc.

bq. At the end of the day it just seems like a bigger risk then a feature ... I feel like
i must still be misunderstanding the motivation you guys have for adding it, because it really
seems like it boils down to "easier then having the property 2.9 set on every analyzer/factory"

Yes you are right, personally I don't want all users to be stuck with Version.LUCENE_24 forever.

> Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory
> -------------------------------------------------------------------------------------------
>                 Key: SOLR-1677
>                 URL:
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Uwe Schindler
>         Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch
> Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards compatibility
with old indexes created using older versions of Lucene. The most important example is StandardTokenizer,
which changed its behaviour with posIncr and incorrect host token types in 2.4 and also in
> In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with much more
Unicode support, almost every Tokenizer/TokenFilter needs this Version parameter. In 2.9,
the deprecated old ctors without Version take LUCENE_24 as default to mimic the old behaviour,
e.g. in StandardTokenizer.
> This patch adds basic support for the Lucene Version property to the base factories.
Subclasses then can use the luceneMatchVersion decoded enum (in 3.0) / Parameter (in 2.9)
for constructing Tokenstreams. The code currently contains a helper map to decode the version
strings, but in 3.0 is can be replaced by Version.valueOf(String), as the Version is a subclass
of Java5 enums. The default value is Version.LUCENE_24 (as this is the default for the no-version
ctors in Lucene).
> This patch also removes unneeded conversions to CharArraySet from StopFilterFactory (now
done by Lucene since 2.9). The generics are also fixed to match Lucene 3.0.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message