tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Barker" <wbar...@wilshire.com>
Subject Re: Bug in B2C converter WAS: svn commit: r568307 - /tomcat/trunk/java/org/apache/tomcat/util/buf/B2CConverter.java
Date Mon, 27 Aug 2007 04:04:51 GMT

"Filip Hanik - Dev Lists" <devlists@hanik.com> wrote in message 
news:46D19889.1060407@hanik.com...
> Bill Barker wrote:
>> "Filip Hanik - Dev Lists" <devlists@hanik.com> wrote in message 
>> news:46CEDC28.6050102@hanik.com...
>>
>>> Bill Barker wrote:
>>>
>>>> "Filip Hanik - Dev Lists" <devlists@hanik.com> wrote in message 
>>>> news:46CD9438.3030106@hanik.com...
>>>>
>>>>
>>>>> Bill Barker wrote:
>>>>>
>>>>>
>>>>>> "Filip Hanik - Dev Lists" <devlists@hanik.com> wrote in message

>>>>>> news:46CC8730.6040504@hanik.com...
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Bill Barker wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> "Remy Maucherat" <remm@apache.org> wrote in message

>>>>>>>> news:46CB7F51.1030201@apache.org...
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Filip Hanik - Dev Lists wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Test Case and 5.5.x patch can be found here.
>>>>>>>>>> http://people.apache.org/~fhanik/tomcat/b2c/
>>>>>>>>>>
>>>>>>>>>> This is what is happening
>>>>>>>>>>
>>>>>>>>>> int cnt=conv.read( result, 0, BUFFER_SIZE );
>>>>>>>>>> is called with a "while (true)" statement,
>>>>>>>>>>
>>>>>>>>>> When the IntermediateInputStream.read returns -1,
the above 
>>>>>>>>>> statement returns cnt==1.
>>>>>>>>>> So to avoid calling conv.read, we must check to see
if we have 
>>>>>>>>>> more bytes to read by implementing the available()
method, to 
>>>>>>>>>> avoid the inputstream ever returning -1.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> It's possible, but I have a hard time understanding the
issue.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> The issue is that InputStreamReader reads 8192 bytes from

>>>>>>>> IntermediateInputStream on the first go.  It then translates
them 
>>>>>>>> into 2734 chars, but thinks that the last few bytes represent
an 
>>>>>>>> incomplete char, so holds onto them.  On the next call, 
>>>>>>>> IntermediateInputStream returns -1, so InputStreamReader
outputs 
>>>>>>>> the last char as best it can (resulting in returning 1).
 Then the 
>>>>>>>> IntermediateInputStream buffer is reset, and it can continue
on 
>>>>>>>> reading (but from the wrong position, resulting in corruption).
>>>>>>>>
>>>>>>>> Filip's patch is inelegant (better would be to use the ByteChunk

>>>>>>>> sink), but other than my looking for a better way to do it,
I can't 
>>>>>>>> come up with the required technical reason to porting the
base of 
>>>>>>>> it to 5.5 (of course, I could care less what he does in his
sandbox 
>>>>>>>> :).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> I've committed the fix to 5.5, if you find a more elegant way
of 
>>>>>>> solving the actual problem, feel free to revert it and commit

>>>>>>> another fix. I don't care about the how, as long as there is
a fix 
>>>>>>> that will be included in the tag 5.5.25 on Friday
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> No problem.  I can see how to do this better, but I'll wait until
the 
>>>>>> weekend to commit (since it's not totally trivial, I don't want a

>>>>>> one-day window for regression testing :).  That way 5.5.25 can go
out 
>>>>>> with your patch.  It doesn't include the NIO dependancy (which was
my 
>>>>>> only concern), so it works well enough for me for now.
>>>>>>
>>>>>>
>>>>>>
>>>>> according to the KISS principle, your fix would have to be less than
4 
>>>>> lines changed to be "more elegant" :)
>>>>>
>>>>>
>>>>>
>>>> Yes, it is more than 4 lines, but most of them are deletes :).  I've 
>>>> done it already on my local machine here, in case anybody wants RTC on 
>>>> the 5.5.x branch (and Filip's test case passes with flying colors :). 
>>>> I'm pretty much sure that there are no regressions for 5.5.x+, but I 
>>>> still need to look at 3.3.x,  and 4.1.x.
>>>>
>>>> If anyone is interested, I can post the patch files.  Otherwise, I'll 
>>>> assume that CTR is still in place, and you can veto it when I commit 
>>>> over the w/e ;).  Of course, if this message was meant as a pre-emptive 
>>>> veto, then I won't bother.
>>>>
>>>>
>>> it's your choice if you want to commit it before or after the tag today.
>>> If you wanna commit it before, then we are counting on your vote :)
>>>
>>>
>>
>> I've noticed a problem with using Reader.mark with multi-byte charsets 
>> (we have a hack in place that works for single-byte charsets).  I could 
>> just commit what I've got here (which should be no worse than before :), 
>> but I'd like to solve this once and for all first.
>>
>> Using Filip's example servlet, if you modify it to do:
>>             reader = request.getReader();
>> +            reader.mark(5);   // content length + terminator
>>             while (true) {
>>                 int c = reader.read();
>>                 if (c == -1 || c == '/')
>>                     break;
>>                 buf.append((char)c);
>>             }
>> +           reader.reset(); // throws IOException here
>>
>> With the current code (and what I have), the first call to reader.read 
>> requests 8192 chars, and produces 2734 chars.  The current code then 
>> results in throwing away the last 2729 chars and abandoning the mark. 
>> The best I've got until now preserves the 2729 chars, but still throws 
>> away the mark, and hence still throws an IOE when reset is called.
>>
>> Long story short, I'm not now sure that I can promise to commit a fix 
>> this weekend :(.
>>
>
> no worries, since UTF-8 can be anywhere between 1 and 6 bytes, wouldn't it 
> just be easier to do
> boolean markSupported() { return false; }
> and not worry about parsing the bytes correctly during a mark?
> the problem you are describing goes beyond that though, so take your time.
>

The problem would be that it would be a regression since probably 4.1.x :(. 
The patch I just committed is still a bit fragile (since it depends on 
InputBuffer knowing what CharChunk does internally), but since they are 
pretty stable classes, I think it should work.  Filip's example webapp works 
like a charm :).

> Filip 




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Mime
View raw message