lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <>
Subject [jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory
Date Mon, 21 Dec 2009 15:20:18 GMT


Hoss Man commented on SOLR-1677:

* As a first hack the solrConfig schema has a new element <luceneMatchVersion> that
contains a solr-wide default luceneMatchVersion value that is used as default for QueryParser,
Analyzers if not specified different
* On the analyzer side, BaseTokenizerFactory and BaseTokenFilterFactory now extend SolrCoreAware
(and I also allowed these classes to be SolrCoreAware) and get the SolrConfig.

I'd really prefer that nothing like this make it into solr.

One: we've worked pretty hard to make sure that nothing in the analysis code is SolrCoreAware
-- the goal was to try and keep the schema related code reusable w/o risk of factories adding
tendrals that reach deep into the other solr code (it's onbly a matter of time until someone
starts refactoring all of the schema related code out of Solr and into a Lucene contrib.

If we really want to add a new "global" setting for the default match version, it should be
in schema.xml, as it pertains to the index itself and how to read/write to the index "properly"
and not to the paticularities of how a particular solr installation might be using that data
(schema.xml => the nature of the data; solrconfig.xml => the usage of the data)

Two: I really question the need for a configurable default across all analysis factories.
 This seems like the type of thing that's going to be changed rarely if ever, and when it
is changed each field will need to be considered very carefully to decide wether the "new"
behavior is desired over hte "old"

I suspect the only time anyone is going to upgrade all factories at once is when we rev lucene
jars and update the example configs -- in that case (and in the case of a user who is happy
to blow away all of their data and take the newest, regardless of what it is, for every analyzer)
a search and replace seem perfectly appropriate.

> Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory
> -------------------------------------------------------------------------------------------
>                 Key: SOLR-1677
>                 URL:
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Uwe Schindler
>         Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch
> Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards compatibility
with old indexes created using older versions of Lucene. The most important example is StandardTokenizer,
which changed its behaviour with posIncr and incorrect host token types in 2.4 and also in
> In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with much more
Unicode support, almost every Tokenizer/TokenFilter needs this Version parameter. In 2.9,
the deprecated old ctors without Version take LUCENE_24 as default to mimic the old behaviour,
e.g. in StandardTokenizer.
> This patch adds basic support for the Lucene Version property to the base factories.
Subclasses then can use the luceneMatchVersion decoded enum (in 3.0) / Parameter (in 2.9)
for constructing Tokenstreams. The code currently contains a helper map to decode the version
strings, but in 3.0 is can be replaced by Version.valueOf(String), as the Version is a subclass
of Java5 enums. The default value is Version.LUCENE_24 (as this is the default for the no-version
ctors in Lucene).
> This patch also removes unneeded conversions to CharArraySet from StopFilterFactory (now
done by Lucene since 2.9). The generics are also fixed to match Lucene 3.0.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message