lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: reset versus setReader on TokenStream
Date Wed, 29 Aug 2012 19:37:40 GMT
ok, lets help improve it: I think these have likely always been confusing.

before they were both reset: reset() and reset(Reader), even though
they are unrelated. I thought the rename would help this :)

Does the TokenStream workfloat here help?
Basically reset() is a mandatory thing the consumer must call. it just
means 'reset any mutable state so you can be reused for processing
This is something on any TokenStream: Tokenizers, TokenFilters, or
even some direct descendent you make that parses byte arrays, or

This means if you are keeping some state across tokens (like
stopfilter's #skippedTokens). here is where you would set that = 0

setReader(Reader) is only on Tokenizer, it means replace the Reader
with a different one to be processed.
The fact that CharTokenizer is doing 'reset()-like-stuff' in here is
bogus IMO, but I dont think it will cause any bugs. Don't emulate it

On Wed, Aug 29, 2012 at 3:29 PM, Benson Margulies <> wrote:
> I've read the javadoc through a few times, but I confess that I'm still
> feeling dense.
> Are all tokenizers responsible for implementing some way of retaining the
> contents of their reader, so that a call to reset without a call to
> setReader rewinds? I note that CharTokenizer doesn't implement #reset,
> which leads me to suspect that I'm not responsible for the rewind behavior.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message