lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: How is incrementToken supposed to detect the lack of reset()?
Date Tue, 07 Jan 2014 20:59:25 GMT
Benson, do you want to open an issue to fix this constructor to not
take Reader? (there might be one already, but lets make a new one).

These things are supposed to be reused, and have setReader for that
purpose. i think its confusing and contributes to bugs that you have
to have logic in e.g. the ctor THEN ALSO in reset().

if someone does it correctly in the ctor, but they only test "one
time", they might think everything is working..

On Tue, Jan 7, 2014 at 3:23 PM, Benson Margulies <benson@basistech.com> wrote:
> For the record of other people who implement tokenizers:
>
> Say that your tokenizer has a constructor, like:
>
>      public MyTokenizer(Reader reader, ....) {
>        super(reader);
>        myWrappedInputDevice = new MyWrappedInputDevice(reader);
>     }
>
> Not a good idea. Tokenizer carefully manages the data flow from the
> constructor arg to the 'input' field. The correct form is:
>
>  public MyTokenizer(Reader reader, ....) {
>        super(reader);
>        myWrappedInputDevice = new MyWrappedInputDevice(this.input);
>     }
>
>
>
> On Tue, Jan 7, 2014 at 2:59 PM, Robert Muir <rcmuir@gmail.com> wrote:
>
>> See Tokenizer.java for the state machine logic. In general you should
>> not have to do anything if the tokenizer is well-behaved (e.g. close
>> calls super.close() and so on).
>>
>>
>>
>> On Tue, Jan 7, 2014 at 2:50 PM, Benson Margulies <bimargulies@gmail.com>
>> wrote:
>> > In 4.6.0,
>> org.apache.lucene.analysis.BaseTokenStreamTestCase#checkResetException
>> >
>> > fails if incrementToken fails to throw if there's a missing reset.
>> >
>> > How am I supposed to organize this in a Tokenizer? A quick look at
>> > CharTokenizer did not reveal any code for the purpose.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message