Mailing-List: contact dev-help@tomcat.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Tomcat Developers List" <dev@tomcat.apache.org>
Received-SPF: pass (athena.apache.org: local policy)
Message-ID: <46D19889.1060407@hanik.com>
Date: Sun, 26 Aug 2007 09:13:13 -0600
From: Filip Hanik - Dev Lists <devlists@hanik.com>
User-Agent: Thunderbird 2.0.0.6 (Windows/20070728)
MIME-Version: 1.0
To: Tomcat Developers List <dev@tomcat.apache.org>
Subject: Re: Bug in B2C converter WAS: svn commit: r568307 -
 /tomcat/trunk/java/org/apache/tomcat/util/buf/B2CConverter.java
References: <20070821221541.7FD151A981A@eris.apache.org>
 <46CB7531.1040004@hanik.com> <46CB7F51.1030201@apache.org>
 <fag84n$5fu$1@sea.gmane.org> <46CC8730.6040504@hanik.com>
 <fais3k$cs2$1@sea.gmane.org> <46CD9438.3030106@hanik.com>
 <falsg8$ja3$1@sea.gmane.org> <46CEDC28.6050102@hanik.com>
 <faqsug$ckn$1@sea.gmane.org>
In-Reply-To: <faqsug$ckn$1@sea.gmane.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Bill Barker wrote:
> "Filip Hanik - Dev Lists" <devlists@hanik.com> wrote in message 
> news:46CEDC28.6050102@hanik.com...
>   
>> Bill Barker wrote:
>>     
>>> "Filip Hanik - Dev Lists" <devlists@hanik.com> wrote in message 
>>> news:46CD9438.3030106@hanik.com...
>>>
>>>       
>>>> Bill Barker wrote:
>>>>
>>>>         
>>>>> "Filip Hanik - Dev Lists" <devlists@hanik.com> wrote in message 
>>>>> news:46CC8730.6040504@hanik.com...
>>>>>
>>>>>
>>>>>           
>>>>>> Bill Barker wrote:
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> "Remy Maucherat" <remm@apache.org> wrote in message 
>>>>>>> news:46CB7F51.1030201@apache.org...
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> Filip Hanik - Dev Lists wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Test Case and 5.5.x patch can be found here.
>>>>>>>>> http://people.apache.org/~fhanik/tomcat/b2c/
>>>>>>>>>
>>>>>>>>> This is what is happening
>>>>>>>>>
>>>>>>>>> int cnt=conv.read( result, 0, BUFFER_SIZE );
>>>>>>>>> is called with a "while (true)" statement,
>>>>>>>>>
>>>>>>>>> When the IntermediateInputStream.read returns -1, the above 
>>>>>>>>> statement returns cnt==1.
>>>>>>>>> So to avoid calling conv.read, we must check to see if we have more 
>>>>>>>>> bytes to read by implementing the available() method, to avoid the 
>>>>>>>>> inputstream ever returning -1.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> It's possible, but I have a hard time understanding the issue.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> The issue is that InputStreamReader reads 8192 bytes from 
>>>>>>> IntermediateInputStream on the first go.  It then translates them 
>>>>>>> into 2734 chars, but thinks that the last few bytes represent an 
>>>>>>> incomplete char, so holds onto them.  On the next call, 
>>>>>>> IntermediateInputStream returns -1, so InputStreamReader outputs the 
>>>>>>> last char as best it can (resulting in returning 1).  Then the 
>>>>>>> IntermediateInputStream buffer is reset, and it can continue on 
>>>>>>> reading (but from the wrong position, resulting in corruption).
>>>>>>>
>>>>>>> Filip's patch is inelegant (better would be to use the ByteChunk 
>>>>>>> sink), but other than my looking for a better way to do it, I can't 
>>>>>>> come up with the required technical reason to porting the base of it 
>>>>>>> to 5.5 (of course, I could care less what he does in his sandbox :).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> I've committed the fix to 5.5, if you find a more elegant way of 
>>>>>> solving the actual problem, feel free to revert it and commit another 
>>>>>> fix. I don't care about the how, as long as there is a fix that will 
>>>>>> be included in the tag 5.5.25 on Friday
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>> No problem.  I can see how to do this better, but I'll wait until the 
>>>>> weekend to commit (since it's not totally trivial, I don't want a 
>>>>> one-day window for regression testing :).  That way 5.5.25 can go out 
>>>>> with your patch.  It doesn't include the NIO dependancy (which was my 
>>>>> only concern), so it works well enough for me for now.
>>>>>
>>>>>
>>>>>           
>>>> according to the KISS principle, your fix would have to be less than 4 
>>>> lines changed to be "more elegant" :)
>>>>
>>>>
>>>>         
>>> Yes, it is more than 4 lines, but most of them are deletes :).  I've done 
>>> it already on my local machine here, in case anybody wants RTC on the 
>>> 5.5.x branch (and Filip's test case passes with flying colors :).  I'm 
>>> pretty much sure that there are no regressions for 5.5.x+, but I still 
>>> need to look at 3.3.x,  and 4.1.x.
>>>
>>> If anyone is interested, I can post the patch files.  Otherwise, I'll 
>>> assume that CTR is still in place, and you can veto it when I commit over 
>>> the w/e ;).  Of course, if this message was meant as a pre-emptive veto, 
>>> then I won't bother.
>>>
>>>       
>> it's your choice if you want to commit it before or after the tag today.
>> If you wanna commit it before, then we are counting on your vote :)
>>
>>     
>
> I've noticed a problem with using Reader.mark with multi-byte charsets (we 
> have a hack in place that works for single-byte charsets).  I could just 
> commit what I've got here (which should be no worse than before :), but I'd 
> like to solve this once and for all first.
>
> Using Filip's example servlet, if you modify it to do:
>             reader = request.getReader();
> +            reader.mark(5);   // content length + terminator
>             while (true) {
>                 int c = reader.read();
>                 if (c == -1 || c == '/')
>                     break;
>                 buf.append((char)c);
>             }
> +           reader.reset(); // throws IOException here
>
> With the current code (and what I have), the first call to reader.read 
> requests 8192 chars, and produces 2734 chars.  The current code then results 
> in throwing away the last 2729 chars and abandoning the mark.  The best I've 
> got until now preserves the 2729 chars, but still throws away the mark, and 
> hence still throws an IOE when reset is called.
>
> Long story short, I'm not now sure that I can promise to commit a fix this 
> weekend :(.
>   

no worries, since UTF-8 can be anywhere between 1 and 6 bytes, wouldn't 
it just be easier to do
boolean markSupported() { return false; }
and not worry about parsing the bytes correctly during a mark?
the problem you are describing goes beyond that though, so take your time.

Filip

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org