lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3055) LUCENE-2372, LUCENE-2389 made it impossible to subclass core analyzers
Date Fri, 29 Apr 2011 20:35:03 GMT


Robert Muir commented on LUCENE-3055:

Hi Ian, you are right the justifications don't totally explain the reasoning behind this change.

>From my perspective the most important reason is to avoid a huge performance trap: previously
if you subclassed one of these analyzers, override tokenStream(), and added SpecialFilter
for example, most of the time users would actually slow down indexing, because now reusableTokenStream()
cannot be used by the indexer.

This created worst-case situations like LUCENE-2279.

Instead, the recommended approach is to just let analyzers be tokenstream factories (which
is all they are). They aren't really "extendable" only "overridable" since they are just factories
for tokenstreams, and by doing so it creates the worst-case performance trap where new objects
are created for every document. I would instead recommend writing your analyzer by extending
ReusableAnalyzerBase instead, which is easy and safe:
Analyzer analyzer = new ReusableAnalyzerBase() {
  protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
    Tokenizer tokenizer = new WhitespaceTokenizer(...);
    TokenStream filteredStream = new FooTokenFilter(tokenizer, ...);
    filteredStream = new BarTokenFilter(filteredStream, ...);
    return new TokenStreamComponents(tokenizer, filteredStream);

> LUCENE-2372, LUCENE-2389 made it impossible to subclass core analyzers
> ----------------------------------------------------------------------
>                 Key: LUCENE-3055
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 3.1
>            Reporter: Ian Soboroff
> LUCENE-2372 and LUCENE-2389 marked all analyzers as final.  This makes ReusableAnalyzerBase
useless, and makes it impossible to subclass e.g. StandardAnalyzer to make a small modification
e.g. to tokenStream().  These issues don't indicate a new method of doing this.  The issues
don't give a reason except for design considerations, which seems a poor reason to make a
backward-incompatible change

This message is automatically generated by JIRA.
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message