lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Problem with CharStream and Tokenizers with custom reset(Reader) method
Date Thu, 10 Sep 2009 15:21:25 GMT
Yeah, lets open an issue and mark it blocker - I'll hold RC4 for it (was
just about to push it when I caught this email).

Uwe Schindler wrote:
> I tested the attached patch, all tests still compile and work as exspected
> (as CharStream extends Reader). 
>
> I think I should open an issue?
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>   
>> -----Original Message-----
>> From: Uwe Schindler [mailto:uwe@thetaphi.de]
>> Sent: Thursday, September 10, 2009 4:54 PM
>> To: java-dev@lucene.apache.org
>> Subject: Problem with CharStream and Tokenizers with custom reset(Reader)
>> method
>>
>> When reviewing the new CharStream code added to Tokenizers, I found a
>> serious problem with backwards compatibility and other Tokenizers, that do
>> not override reset(CharStream).
>>
>> The problem is, that e.g. CharTokenizer only overrides reset(Reader):
>>
>>   public void reset(Reader input) throws IOException {
>>     super.reset(input);
>>     bufferIndex = 0;
>>     offset = 0;
>>     dataLen = 0;
>>   }
>>
>> If you reset such a Tokenizer with another CharStream (not a Reader), this
>> method will never be called and breaking the whole Tokenizer.
>>
>> As CharStream extends Reader, I propose to remove this reset(CharStream
>> method) and simply do an instanceof check to detect if the supplied Reader
>> is no CharStream and wrap it. We could also remove the extra ctor (because
>> most Tokenizers have no support for passing CharStreams). If the ctor also
>> checks with instanceof and warps as needed the code is backwards
>> compatible
>> and we do not need to add additional ctors in subclasses.
>>
>> As this instanceof check is always done in CharReader.get() why not remove
>> ctor(CharStream) and reset(CharStream) completely?
>>
>> Any thoughts?
>>
>> I would like to fix this somehow before RC4, I', sorry :(
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>     
>
>   
> ------------------------------------------------------------------------
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message