lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benson Margulies <>
Subject Re: reset versus setReader on TokenStream
Date Wed, 29 Aug 2012 20:18:49 GMT
If I'm following, you've created a division of labor between setReader and

We have a tokenizer that has a good deal of state, since it has to split
the input into chunks. If I'm following here, you'd recommend that we do
nothing special in setReader, but have #reset fix up all the state on the
assumption that we are are starting from the beginning of something, and
we'd reinitialize our chunker over what was sitting in the protected
'input'. If someone called #setReader and neglected to call #reset, awful
things would happen, but you've warned them.

To me, it seemed natural to overload #setReader so that our tokenizer was
in a consistent state once it was called. It occurs to me to wonder about
order: if #reset is called before #setReader, I'm up creek unless I copy my
reset implementation into a local override of #setReader.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message