tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Barker" <wbar...@wilshire.com>
Subject Re: Bug in B2C converter WAS: svn commit: r568307 - /tomcat/trunk/java/org/apache/tomcat/util/buf/B2CConverter.java
Date Sun, 26 Aug 2007 03:44:53 GMT

"Filip Hanik - Dev Lists" <devlists@hanik.com> wrote in message 
news:46CEDC28.6050102@hanik.com...
> Bill Barker wrote:
>> "Filip Hanik - Dev Lists" <devlists@hanik.com> wrote in message 
>> news:46CD9438.3030106@hanik.com...
>>
>>> Bill Barker wrote:
>>>
>>>> "Filip Hanik - Dev Lists" <devlists@hanik.com> wrote in message 
>>>> news:46CC8730.6040504@hanik.com...
>>>>
>>>>
>>>>> Bill Barker wrote:
>>>>>
>>>>>
>>>>>> "Remy Maucherat" <remm@apache.org> wrote in message 
>>>>>> news:46CB7F51.1030201@apache.org...
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Filip Hanik - Dev Lists wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Test Case and 5.5.x patch can be found here.
>>>>>>>> http://people.apache.org/~fhanik/tomcat/b2c/
>>>>>>>>
>>>>>>>> This is what is happening
>>>>>>>>
>>>>>>>> int cnt=conv.read( result, 0, BUFFER_SIZE );
>>>>>>>> is called with a "while (true)" statement,
>>>>>>>>
>>>>>>>> When the IntermediateInputStream.read returns -1, the above

>>>>>>>> statement returns cnt==1.
>>>>>>>> So to avoid calling conv.read, we must check to see if we
have more 
>>>>>>>> bytes to read by implementing the available() method, to
avoid the 
>>>>>>>> inputstream ever returning -1.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> It's possible, but I have a hard time understanding the issue.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> The issue is that InputStreamReader reads 8192 bytes from 
>>>>>> IntermediateInputStream on the first go.  It then translates them

>>>>>> into 2734 chars, but thinks that the last few bytes represent an

>>>>>> incomplete char, so holds onto them.  On the next call, 
>>>>>> IntermediateInputStream returns -1, so InputStreamReader outputs
the 
>>>>>> last char as best it can (resulting in returning 1).  Then the 
>>>>>> IntermediateInputStream buffer is reset, and it can continue on 
>>>>>> reading (but from the wrong position, resulting in corruption).
>>>>>>
>>>>>> Filip's patch is inelegant (better would be to use the ByteChunk

>>>>>> sink), but other than my looking for a better way to do it, I can't

>>>>>> come up with the required technical reason to porting the base of
it 
>>>>>> to 5.5 (of course, I could care less what he does in his sandbox
:).
>>>>>>
>>>>>>
>>>>>>
>>>>> I've committed the fix to 5.5, if you find a more elegant way of 
>>>>> solving the actual problem, feel free to revert it and commit another

>>>>> fix. I don't care about the how, as long as there is a fix that will

>>>>> be included in the tag 5.5.25 on Friday
>>>>>
>>>>>
>>>>>
>>>> No problem.  I can see how to do this better, but I'll wait until the 
>>>> weekend to commit (since it's not totally trivial, I don't want a 
>>>> one-day window for regression testing :).  That way 5.5.25 can go out 
>>>> with your patch.  It doesn't include the NIO dependancy (which was my 
>>>> only concern), so it works well enough for me for now.
>>>>
>>>>
>>> according to the KISS principle, your fix would have to be less than 4 
>>> lines changed to be "more elegant" :)
>>>
>>>
>>
>> Yes, it is more than 4 lines, but most of them are deletes :).  I've done 
>> it already on my local machine here, in case anybody wants RTC on the 
>> 5.5.x branch (and Filip's test case passes with flying colors :).  I'm 
>> pretty much sure that there are no regressions for 5.5.x+, but I still 
>> need to look at 3.3.x,  and 4.1.x.
>>
>> If anyone is interested, I can post the patch files.  Otherwise, I'll 
>> assume that CTR is still in place, and you can veto it when I commit over 
>> the w/e ;).  Of course, if this message was meant as a pre-emptive veto, 
>> then I won't bother.
>>
> it's your choice if you want to commit it before or after the tag today.
> If you wanna commit it before, then we are counting on your vote :)
>

I've noticed a problem with using Reader.mark with multi-byte charsets (we 
have a hack in place that works for single-byte charsets).  I could just 
commit what I've got here (which should be no worse than before :), but I'd 
like to solve this once and for all first.

Using Filip's example servlet, if you modify it to do:
            reader = request.getReader();
+            reader.mark(5);   // content length + terminator
            while (true) {
                int c = reader.read();
                if (c == -1 || c == '/')
                    break;
                buf.append((char)c);
            }
+           reader.reset(); // throws IOException here

With the current code (and what I have), the first call to reader.read 
requests 8192 chars, and produces 2734 chars.  The current code then results 
in throwing away the last 2729 chars and abandoning the mark.  The best I've 
got until now preserves the 2729 chars, but still throws away the mark, and 
hence still throws an IOE when reset is called.

Long story short, I'm not now sure that I can promise to commit a fix this 
weekend :(.

> Filip
>>
>>> Filip
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
>> For additional commands, e-mail: dev-help@tomcat.apache.org
>>
>>
>>
>> 




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Mime
View raw message