lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: reset versus setReader on TokenStream
Date Wed, 29 Aug 2012 20:51:58 GMT
On Wed, Aug 29, 2012 at 4:18 PM, Benson Margulies <> wrote:
> If I'm following, you've created a division of labor between setReader and
> reset.

Thats not true. setReader shouldnt be doing any labor. its really only
a setter!

One possibility here is to make it final (though its not obvious to me
that it would clear up the situation, I think javadocs are more
important here).

> We have a tokenizer that has a good deal of state, since it has to split
> the input into chunks. If I'm following here, you'd recommend that we do
> nothing special in setReader, but have #reset fix up all the state on the
> assumption that we are are starting from the beginning of something, and
> we'd reinitialize our chunker over what was sitting in the protected
> 'input'. If someone called #setReader and neglected to call #reset, awful
> things would happen, but you've warned them.

If someone called setReader and neglected to call reset, aweful things
will happen to them in general. they would be violating the contracts
of the API and the workflow described in the javadocs.

Thats why we test as much consumer code as possible against
MockTokenizer (from test-framework package). it has a state machine
that will fail if you do this.

> To me, it seemed natural to overload #setReader so that our tokenizer was
> in a consistent state once it was called. It occurs to me to wonder about
> order: if #reset is called before #setReader, I'm up creek unless I copy my
> reset implementation into a local override of #setReader.

This would also be a violation on the consumer's part (also detected
by MockTokenizer, in case you have such consumers like queryparsers or
whatever you want to test).


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message