lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benson Margulies <ben...@basistech.com>
Subject Re: How is incrementToken supposed to detect the lack of reset()?
Date Tue, 07 Jan 2014 20:23:27 GMT
For the record of other people who implement tokenizers:

Say that your tokenizer has a constructor, like:

     public MyTokenizer(Reader reader, ....) {
       super(reader);
       myWrappedInputDevice = new MyWrappedInputDevice(reader);
    }

Not a good idea. Tokenizer carefully manages the data flow from the
constructor arg to the 'input' field. The correct form is:

 public MyTokenizer(Reader reader, ....) {
       super(reader);
       myWrappedInputDevice = new MyWrappedInputDevice(this.input);
    }



On Tue, Jan 7, 2014 at 2:59 PM, Robert Muir <rcmuir@gmail.com> wrote:

> See Tokenizer.java for the state machine logic. In general you should
> not have to do anything if the tokenizer is well-behaved (e.g. close
> calls super.close() and so on).
>
>
>
> On Tue, Jan 7, 2014 at 2:50 PM, Benson Margulies <bimargulies@gmail.com>
> wrote:
> > In 4.6.0,
> org.apache.lucene.analysis.BaseTokenStreamTestCase#checkResetException
> >
> > fails if incrementToken fails to throw if there's a missing reset.
> >
> > How am I supposed to organize this in a Tokenizer? A quick look at
> > CharTokenizer did not reveal any code for the purpose.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message