lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory
Date Fri, 15 Jan 2010 03:08:54 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800518#action_12800518
] 

Robert Muir commented on SOLR-1677:
-----------------------------------

bq. The implication i got from Robert was that there was (or would soon be) expectations in
Lucene-Java code that if one object was told to use Version.X it wold be assumed that every
other object in the application was using Version.X.

Hoss, I didn't mean to imply any such thing, just that i don't see any tests (or the framework
for testing such behavior), so even if its officially supported, in my opinion it does not
exist.

For example, as far as analysis goes, my personal opinion is that in any given package (say
one language, or whatever), we will test the entire Analyzer against Version X, and will test
that back compat works for Version Y, Z, etc. 

But i personally can't see myself ensuring the all the underlying tokenstreams (maybe this
language uses 5 lets say), works across all the permutations of different versions { X, Y,
Z } you can apply, its simply asking too much.

> Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1677
>                 URL: https://issues.apache.org/jira/browse/SOLR-1677
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Uwe Schindler
>         Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch
>
>
> Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards compatibility
with old indexes created using older versions of Lucene. The most important example is StandardTokenizer,
which changed its behaviour with posIncr and incorrect host token types in 2.4 and also in
2.9.
> In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with much more
Unicode support, almost every Tokenizer/TokenFilter needs this Version parameter. In 2.9,
the deprecated old ctors without Version take LUCENE_24 as default to mimic the old behaviour,
e.g. in StandardTokenizer.
> This patch adds basic support for the Lucene Version property to the base factories.
Subclasses then can use the luceneMatchVersion decoded enum (in 3.0) / Parameter (in 2.9)
for constructing Tokenstreams. The code currently contains a helper map to decode the version
strings, but in 3.0 is can be replaced by Version.valueOf(String), as the Version is a subclass
of Java5 enums. The default value is Version.LUCENE_24 (as this is the default for the no-version
ctors in Lucene).
> This patch also removes unneeded conversions to CharArraySet from StopFilterFactory (now
done by Lucene since 2.9). The generics are also fixed to match Lucene 3.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message