lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: reset versus setReader on TokenStream
Date Wed, 29 Aug 2012 20:08:27 GMT
On Wed, Aug 29, 2012 at 3:58 PM, Benson Margulies <benson@basistech.com> wrote:
> I think I'm beginning to get the idea. Is the following plausible?
>
> At the bottom of the stack, there's an actual source of data -- like a
> tokenizer. For one of those, reset() is a bit silly, and something like
> setReader is the brains of the operation.

Actually i think setReader() is silly in most cases for Tokenizers.
Most tokenizers should never override this (in fact technically we
could make it final or something, to make it super-clear, but that
might be a bit over the top)

The default implementation in Tokenizer.java should almost always
suffice, as it does what you expect a setter would do in java:

  public void setReader(Reader input) throws IOException {
    assert input != null: "input must not be null";
    this.input = input;
  }

So lets take your CharTokenizer example:

  @Override
  public void setReader(Reader input) throws IOException {
    super.setReader(input);
    bufferIndex = 0;
    offset = 0;
    dataLen = 0;
    finalOffset = 0;
    ioBuffer.reset(); // make sure to reset the IO buffer!!
  }

Really this is bogus, i think it should not override this method at
all, and instead should do:

  @Override
  public void reset() throws IOException {
    // reset our internal state
    bufferIndex = 0;
    offset = 0;
    dataLen = 0;
    finalOffset = 0;
    ioBuffer.reset(); // make sure to reset the IO buffer!!
  }

Does that make sense?

-- 
lucidworks.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message