lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eran Sevi <erans...@gmail.com>
Subject Bug in StandardAnalyzer + StopAnalyzer?
Date Sun, 15 Nov 2009 16:19:10 GMT
Hi,
when changing my code to support the not-so-new reusableTokenStream I
noticed that in the cases when a SavedStream class was used in an analyzer
(Standard,Stop and maybe others as well) the reset() method is called on the
tokenizer instead of on the filter.

The filter implementation of reset() calls the inner filters+input reset()
methods, but the tokenizer reset() method can't do that.
I think this bug hasn't caused any errors so far since none of the filters
used in the analyzers overrides the reset() method, but it might cause
problems if the implementation changes in the future.

the fix is very simple. for example (in StandardAnalyzer):

if (streams == null) {
      streams = new SavedStreams();
      setPreviousTokenStream(streams);
      streams.tokenStream = new StandardTokenizer(matchVersion, reader);
      streams.filteredTokenStream = new StandardFilter(streams.tokenStream);
      streams.filteredTokenStream = new
LowerCaseFilter(streams.filteredTokenStream);
      streams.filteredTokenStream = new
StopFilter(StopFilter.getEnablePositionIncrementsVersionDefault(matchVersion),

streams.filteredTokenStream, stopSet);
    } else {
      streams.tokenStream.reset(reader);
    }

should become:

if (streams == null) {
      streams = new SavedStreams();
      setPreviousTokenStream(streams);
      streams.tokenStream = new StandardTokenizer(matchVersion, reader);
      streams.filteredTokenStream = new StandardFilter(streams.tokenStream);
      streams.filteredTokenStream = new
LowerCaseFilter(streams.filteredTokenStream);
      streams.filteredTokenStream = new
StopFilter(StopFilter.getEnablePositionIncrementsVersionDefault(matchVersion),

streams.filteredTokenStream, stopSet);
    } else {
      streams.filteredTokenStream.reset(); // changed line.
    }


What do you think?

Eran.

Mime
View raw message