lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <>
Subject [jira] Updated: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)
Date Mon, 19 Oct 2009 17:53:59 GMT


Uwe Schindler updated LUCENE-1987:

    Attachment: LUCENE-1987-StopFilter-backport29.patch

Here 2 mega patches and one backport to 2.9 (want to get this in before 2.9.1):

All core tests pass, all bw tests pass. Most contrib tests also pass, but we have the following
problems and inconsistencies:

- benchmark does not work any longer, because StandardAnalyzer has no default ctor anymore
and cannot be instantiated by reflection, same with StopAnalyzer
- Highlighter only works, if StandardAnalyzer is in 2.4 mde, in 2.9 mode (current) it fails
because the position increments of stop words are not correctly respected. This fails in addition/combination
with the following:
- Very bad inconsistency: The default of QueryParser is to ignore position increments, but
the current version of StandardAnalyzer uses posIncr for stop words -> bäng. We should
change the default for QueryParser(+ contrib QP), too. There is march rework needed and much
documentation. The tests in core now pass, as most parts use StandardAnalyzer in 2.9 mode
but have no stop words. And the special tests explicitely set the posIncr flag. This is totally
disturbed, it needs fixing! (it also affects 2.9.0, if somebody uses the new StandardAnalyzer
- XMLQueryParser also fails with latest StandardAnalyzer version, because it cannot set the
flag in QueryParser. In my opinion, the query parser should take the flag from the analyzer,
but this is not easy to fix.
- All contrib analyzers have stopWordPosIncr turned off (backwards compatibility). Maybe we
need a Version Parameter in all analyzers there too!

What to do? After this StopFilter/StandardAnalyzer-hell-day Aspirin and Paracetamol and beer
is not enough to think clear again...

And please: next time when we deprecate APIs: remove all deprecated calls from tests and contrib
and mark all deprecated-test as such!

> Remove rest of analysis deprecations (Token, CharacterCache)
> ------------------------------------------------------------
>                 Key: LUCENE-1987
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: Analysis
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.0
>         Attachments: LUCENE-1987-StopFilter-backport29.patch, LUCENE-1987-StopFilter-BW.patch,
LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch,
LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, LUCENE-1987.patch, LUCENE-1987.patch
> These removes the rest of the deprecations in the analysis package:
> - -Token's termText field-- (DONE)
> - -eventually un-deprecate ctors of Token taking Strings (they are still useful) ->
if yes remove deprec in 2.9.1- (DONE)
> - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
> - Stopwords lists
> - Remove the backwards settings from analyzers (acronym, posIncr,...). They are deprecated,
but we still have the VERSION constants. Do not know, how to proceed. Keep the settings alive
for index compatibility? Or remove it together with the version constants (which were undeprecated).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message