lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carsten Schnober <>
Subject Re: TokenStreamComponents in Lucene 4.0
Date Tue, 20 Nov 2012 11:26:48 GMT
Am 20.11.2012 10:22, schrieb Uwe Schindler:


> The createComponents() method of Analyzers is only called *once* for each thread and
the Tokenstream is *reused* for later documents. The Analyzer will call the final method Tokenizer#setReader()
to notify the Tokenizer of a new Reader (this method will update the protected "input" field
in the Tokenizer base class) and then it will reset() the whole tokenization chain. The custom
TokenStream components must "initialize" themselves with the new settings on the reset() method.

Thanks, Uwe!
I think what changed in comparison to Lucene 3.6 is that reset() is
called upon initialization, too, instead of after processing the first
document only, right? Apart from the fact that it used not to be
obligatory to make all components reuseable, I suppose.

Institut für Deutsche Sprache |
Projekt KorAP                 |
Tel. +49-(0)621-43740789      |
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message