lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: reset versus setReader on TokenStream
Date Wed, 29 Aug 2012 19:52:30 GMT
On Wed, Aug 29, 2012 at 3:45 PM, Benson Margulies <benson@basistech.com> wrote:
> On Wed, Aug 29, 2012 at 3:37 PM, Robert Muir <rcmuir@gmail.com> wrote:
>
>> ok, lets help improve it: I think these have likely always been confusing.
>>
>> before they were both reset: reset() and reset(Reader), even though
>> they are unrelated. I thought the rename would help this :)
>>
>> Does the TokenStream workfloat here help?
>>
>> http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/analysis/TokenStream.html
>> Basically reset() is a mandatory thing the consumer must call. it just
>> means 'reset any mutable state so you can be reused for processing
>> again'.
>>
>
> I really did read this. setReader I get; I don't understand what reset
> accomplishes. What does it mean to reuse one a TokenStream without calling
> setReader to supply a new input?

TokenStream is more generic, it doesnt have to take Reader. It can
take anything you want: e.g. a String or a byte array of your Word
document or whatever.

Tokenizer is a subclass that takes Reader. its the only thing that has
setReader.

reset() doesnt mean rewind. it just means clearing any accumulated
internal state so its ready for processing again.

so if i made a StringTokenizer class that extends Tokenizer, i would
probably add setString(String s) to it so i could set new string
objects on it, but consumers
must always call reset() on the entire chain (the outer stopfilters,
synonym filters, all this stuff that might be keeping state). this
reset() call chains down
all tokenstreams.

-- 
lucidworks.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message