Return-Path: Delivered-To: apmail-tomcat-dev-archive@www.apache.org Received: (qmail 59984 invoked from network); 26 Aug 2007 15:13:51 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 26 Aug 2007 15:13:51 -0000 Received: (qmail 54315 invoked by uid 500); 26 Aug 2007 15:13:40 -0000 Delivered-To: apmail-tomcat-dev-archive@tomcat.apache.org Received: (qmail 54292 invoked by uid 500); 26 Aug 2007 15:13:40 -0000 Mailing-List: contact dev-help@tomcat.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Tomcat Developers List" Delivered-To: mailing list dev@tomcat.apache.org Received: (qmail 54278 invoked by uid 99); 26 Aug 2007 15:13:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 26 Aug 2007 08:13:40 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [72.22.94.67] (HELO virtual.halosg.com) (72.22.94.67) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 26 Aug 2007 15:13:34 +0000 Received: (qmail 9239 invoked from network); 26 Aug 2007 10:12:30 -0500 Received: from 72-19-171-38.static.mesanetworks.net (HELO ?192.168.1.2?) (72.19.171.38) by halosg.com with SMTP; 26 Aug 2007 10:12:30 -0500 Message-ID: <46D19889.1060407@hanik.com> Date: Sun, 26 Aug 2007 09:13:13 -0600 From: Filip Hanik - Dev Lists User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) MIME-Version: 1.0 To: Tomcat Developers List Subject: Re: Bug in B2C converter WAS: svn commit: r568307 - /tomcat/trunk/java/org/apache/tomcat/util/buf/B2CConverter.java References: <20070821221541.7FD151A981A@eris.apache.org> <46CB7531.1040004@hanik.com> <46CB7F51.1030201@apache.org> <46CC8730.6040504@hanik.com> <46CD9438.3030106@hanik.com> <46CEDC28.6050102@hanik.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Bill Barker wrote: > "Filip Hanik - Dev Lists" wrote in message > news:46CEDC28.6050102@hanik.com... > >> Bill Barker wrote: >> >>> "Filip Hanik - Dev Lists" wrote in message >>> news:46CD9438.3030106@hanik.com... >>> >>> >>>> Bill Barker wrote: >>>> >>>> >>>>> "Filip Hanik - Dev Lists" wrote in message >>>>> news:46CC8730.6040504@hanik.com... >>>>> >>>>> >>>>> >>>>>> Bill Barker wrote: >>>>>> >>>>>> >>>>>> >>>>>>> "Remy Maucherat" wrote in message >>>>>>> news:46CB7F51.1030201@apache.org... >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Filip Hanik - Dev Lists wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Test Case and 5.5.x patch can be found here. >>>>>>>>> http://people.apache.org/~fhanik/tomcat/b2c/ >>>>>>>>> >>>>>>>>> This is what is happening >>>>>>>>> >>>>>>>>> int cnt=conv.read( result, 0, BUFFER_SIZE ); >>>>>>>>> is called with a "while (true)" statement, >>>>>>>>> >>>>>>>>> When the IntermediateInputStream.read returns -1, the above >>>>>>>>> statement returns cnt==1. >>>>>>>>> So to avoid calling conv.read, we must check to see if we have more >>>>>>>>> bytes to read by implementing the available() method, to avoid the >>>>>>>>> inputstream ever returning -1. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> It's possible, but I have a hard time understanding the issue. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> The issue is that InputStreamReader reads 8192 bytes from >>>>>>> IntermediateInputStream on the first go. It then translates them >>>>>>> into 2734 chars, but thinks that the last few bytes represent an >>>>>>> incomplete char, so holds onto them. On the next call, >>>>>>> IntermediateInputStream returns -1, so InputStreamReader outputs the >>>>>>> last char as best it can (resulting in returning 1). Then the >>>>>>> IntermediateInputStream buffer is reset, and it can continue on >>>>>>> reading (but from the wrong position, resulting in corruption). >>>>>>> >>>>>>> Filip's patch is inelegant (better would be to use the ByteChunk >>>>>>> sink), but other than my looking for a better way to do it, I can't >>>>>>> come up with the required technical reason to porting the base of it >>>>>>> to 5.5 (of course, I could care less what he does in his sandbox :). >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> I've committed the fix to 5.5, if you find a more elegant way of >>>>>> solving the actual problem, feel free to revert it and commit another >>>>>> fix. I don't care about the how, as long as there is a fix that will >>>>>> be included in the tag 5.5.25 on Friday >>>>>> >>>>>> >>>>>> >>>>>> >>>>> No problem. I can see how to do this better, but I'll wait until the >>>>> weekend to commit (since it's not totally trivial, I don't want a >>>>> one-day window for regression testing :). That way 5.5.25 can go out >>>>> with your patch. It doesn't include the NIO dependancy (which was my >>>>> only concern), so it works well enough for me for now. >>>>> >>>>> >>>>> >>>> according to the KISS principle, your fix would have to be less than 4 >>>> lines changed to be "more elegant" :) >>>> >>>> >>>> >>> Yes, it is more than 4 lines, but most of them are deletes :). I've done >>> it already on my local machine here, in case anybody wants RTC on the >>> 5.5.x branch (and Filip's test case passes with flying colors :). I'm >>> pretty much sure that there are no regressions for 5.5.x+, but I still >>> need to look at 3.3.x, and 4.1.x. >>> >>> If anyone is interested, I can post the patch files. Otherwise, I'll >>> assume that CTR is still in place, and you can veto it when I commit over >>> the w/e ;). Of course, if this message was meant as a pre-emptive veto, >>> then I won't bother. >>> >>> >> it's your choice if you want to commit it before or after the tag today. >> If you wanna commit it before, then we are counting on your vote :) >> >> > > I've noticed a problem with using Reader.mark with multi-byte charsets (we > have a hack in place that works for single-byte charsets). I could just > commit what I've got here (which should be no worse than before :), but I'd > like to solve this once and for all first. > > Using Filip's example servlet, if you modify it to do: > reader = request.getReader(); > + reader.mark(5); // content length + terminator > while (true) { > int c = reader.read(); > if (c == -1 || c == '/') > break; > buf.append((char)c); > } > + reader.reset(); // throws IOException here > > With the current code (and what I have), the first call to reader.read > requests 8192 chars, and produces 2734 chars. The current code then results > in throwing away the last 2729 chars and abandoning the mark. The best I've > got until now preserves the 2729 chars, but still throws away the mark, and > hence still throws an IOE when reset is called. > > Long story short, I'm not now sure that I can promise to commit a fix this > weekend :(. > no worries, since UTF-8 can be anywhere between 1 and 6 bytes, wouldn't it just be easier to do boolean markSupported() { return false; } and not worry about parsing the bytes correctly during a mark? the problem you are describing goes beyond that though, so take your time. Filip --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org For additional commands, e-mail: dev-help@tomcat.apache.org