lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: reset versus setReader on TokenStream
Date Wed, 29 Aug 2012 19:37:40 GMT
ok, lets help improve it: I think these have likely always been confusing.

before they were both reset: reset() and reset(Reader), even though
they are unrelated. I thought the rename would help this :)

Does the TokenStream workfloat here help?
http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/analysis/TokenStream.html
Basically reset() is a mandatory thing the consumer must call. it just
means 'reset any mutable state so you can be reused for processing
again'.
This is something on any TokenStream: Tokenizers, TokenFilters, or
even some direct descendent you make that parses byte arrays, or
whatever.

This means if you are keeping some state across tokens (like
stopfilter's #skippedTokens). here is where you would set that = 0
again.

setReader(Reader) is only on Tokenizer, it means replace the Reader
with a different one to be processed.
The fact that CharTokenizer is doing 'reset()-like-stuff' in here is
bogus IMO, but I dont think it will cause any bugs. Don't emulate it
:)

On Wed, Aug 29, 2012 at 3:29 PM, Benson Margulies <benson@basistech.com> wrote:
> I've read the javadoc through a few times, but I confess that I'm still
> feeling dense.
>
> Are all tokenizers responsible for implementing some way of retaining the
> contents of their reader, so that a call to reset without a call to
> setReader rewinds? I note that CharTokenizer doesn't implement #reset,
> which leads me to suspect that I'm not responsible for the rewind behavior.



-- 
lucidworks.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message