lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Uwe Schindler <...@thetaphi.de>
Subject Re: ReusableAnalyzerBase bug in 3.4.0?
Date Tue, 01 Nov 2011 22:18:25 GMT
Every Filter must support reset(). There is no call to it needed in the analyzer reusable handling,
as the consumer must call reset before calling incrementToken() for the first time.

In 3.4 there are sometimes useless extra calls to reset() for backwards reasons like on sink,
but in trunk aka 4.0 no longer there.

The need to call reset() before consuming is described in TokenStream javadocs.

Uwe
--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de



Paul Jakubik <paul@purediscovery.com> schrieb:

Hi,


I think I found a bug in ReusableAnalyzerBase, but am also wondering if I'm simply missing
something. Let me describe what I am seeing, and maybe you can point out where I'm making
bad assumptions.


By using the ReusableAnalyzerBase you can create a single shared analyzer, and it contains
code to make the interesting parts of your analyzer thread local.


Part of making this work is putting all of the interesting components inside of of ReusableAnalyzerBase.TokenStreamComponents.


When you call ReusableAnalyzerBase.reusableTokenStream, it checks if it has a thread local
TokenStreamComponents, and if so it calls TokenStreamComponents.reset(Reader) resetting the
token source. This method does not reset the TokenStream sink in TokenStreamComponents.


Because of this, if any of the filters in the TokenStream are stateful, you have to recreate
them instead of resetting them and using them again. So if you use a filter like LimitTokenCountFilter
or ShingleFilter, you have to recreate it, even though these filters have reset methods that
could be called.


Am I missing important reasons why TokenStreamComponents.reset is implemented as:

    protected boolean reset(final Reader reader) throws IOException {

      source.reset(reader);

      return true;

    }


instead of

    protected boolean reset(final Reader reader) throws IOException {

      source.reset(reader);

      sink.reset();

      return true;

    }


If there is a good reason to avoid resetting the sink here, then would it help other people
to better document that implementations of ReusableAnalyzerBase.createComponents should not
create stateful components?


Paul






Mime
View raw message