lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <jason.rutherg...@gmail.com>
Subject Re: Problem with CharStream and Tokenizers with custom reset(Reader) method
Date Thu, 10 Sep 2009 22:27:49 GMT
I've been seeing strange behavior perhaps related to this? Where
sometimes a query is parsed and analyzed using Solr analyzers to
it's first clause fairly randomly, and other times the same
exact query is parsed and analyzed to the full correct query with all
clauses. It's so baffling I haven't really figured out an
approach to debugging it. I wonder if it's related to this
stream resetting issue.

On Thu, Sep 10, 2009 at 7:54 AM, Uwe Schindler <uwe@thetaphi.de> wrote:
> When reviewing the new CharStream code added to Tokenizers, I found a
> serious problem with backwards compatibility and other Tokenizers, that do
> not override reset(CharStream).
>
> The problem is, that e.g. CharTokenizer only overrides reset(Reader):
>
>  public void reset(Reader input) throws IOException {
>    super.reset(input);
>    bufferIndex = 0;
>    offset = 0;
>    dataLen = 0;
>  }
>
> If you reset such a Tokenizer with another CharStream (not a Reader), this
> method will never be called and breaking the whole Tokenizer.
>
> As CharStream extends Reader, I propose to remove this reset(CharStream
> method) and simply do an instanceof check to detect if the supplied Reader
> is no CharStream and wrap it. We could also remove the extra ctor (because
> most Tokenizers have no support for passing CharStreams). If the ctor also
> checks with instanceof and warps as needed the code is backwards compatible
> and we do not need to add additional ctors in subclasses.
>
> As this instanceof check is always done in CharReader.get() why not remove
> ctor(CharStream) and reset(CharStream) completely?
>
> Any thoughts?
>
> I would like to fix this somehow before RC4, I', sorry :(
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message