Return-Path: Delivered-To: apmail-jakarta-httpclient-user-archive@www.apache.org Received: (qmail 23031 invoked from network); 13 Apr 2007 19:54:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 13 Apr 2007 19:54:22 -0000 Received: (qmail 1855 invoked by uid 500); 13 Apr 2007 19:54:27 -0000 Delivered-To: apmail-jakarta-httpclient-user-archive@jakarta.apache.org Received: (qmail 1837 invoked by uid 500); 13 Apr 2007 19:54:27 -0000 Mailing-List: contact httpclient-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: "HttpClient User Discussion" Reply-To: "HttpClient User Discussion" Delivered-To: mailing list httpclient-user@jakarta.apache.org Received: (qmail 1826 invoked by uid 99); 13 Apr 2007 19:54:27 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Apr 2007 12:54:27 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [62.2.95.247] (HELO smtp.hispeed.ch) (62.2.95.247) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Apr 2007 12:54:20 -0700 Received: from [192.168.1.102] (84-75-116-76.dclient.hispeed.ch [84.75.116.76]) (authenticated bits=0) by smtp.hispeed.ch (8.12.11.20060308/8.12.11/taifun-1.0) with ESMTP id l3DJrrXq009458 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO) for ; Fri, 13 Apr 2007 21:53:58 +0200 Subject: RE: Performance issues in ChunkedInputStream From: Oleg Kalnichevski To: HttpClient User Discussion In-Reply-To: <14F99AAB26DBE04E87B64C252587948202BEF778@ehost005-2.exch005intermedia.net> References: <461D295E.6070904@dubioso.net> <461D354E.8090206@apache.org> <14F99AAB26DBE04E87B64C252587948202BEF2B8@ehost005-2.exch005intermedia.net> <461EAC0E.9090402@apache.org> <14F99AAB26DBE04E87B64C252587948202BEF778@ehost005-2.exch005intermedia.net> Content-Type: text/plain Date: Fri, 13 Apr 2007 21:53:52 +0200 Message-Id: <1176494032.5508.3.camel@okhost> Mime-Version: 1.0 X-Mailer: Evolution 2.8.1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV version 0.88.7, clamav-milter version 0.88.7 on smtp-01.tornado.cablecom.ch X-Virus-Status: Clean X-DCC-spamcheck-02.tornado.cablecom.ch-Metrics: smtp-01.tornado.cablecom.ch 1378; Body=1 Fuz1=1 Fuz2=1 X-Virus-Checked: Checked by ClamAV on apache.org On Fri, 2007-04-13 at 09:13 -0700, Igor Lubashev wrote: > Oleg, I've not only done 0 benchmarking, but I've done 0 testing on this > class. It's just something I put together to throw into the discussion. > I make to further claims than this. In fact, I prefaced this post with > "It is hard to believe that reading a byte at a time is a bottleneck". > > Now, when you are saying that "most performance gains came > chiefly from [...] elimination of unnecessary synchronization", do you > mean only contested synchronization points or uncontested ones, too? > > If even uncontested synchronization is undesirable, then > LineReaderInputStream.readLine() eliminates the need to repeatedly call > BufferedInputStream.read(), which is synchronized. > Igor, Uncontested synchronization is clearly undesirable if it is completely pointless. StringBuffer, which is used quite extensively throughout HttpClient 3.x codebase, is synchronized for some stupid reason. Instances of StringBuffer are usually short-lived and are accessed by a single thread only (who in their sane mind would want to concatenate strings from multiple threads anyways?). So, synchronization (even uncontested) on an instance of StringBuffer is a waste of CPU cycles. Yet, there are places where StringBuffer gets filled character by character. Same story goes for BufferedInputStream. The point I am trying to make is simple. If anyone is prepared to submit a fully tested patch that applies cleanly against the SVN trunk and provides sufficient test coverage for the new functionality, I'll happily check it in. I have already done all this hard work for HttpClient 4.0 including writing test cases. I have no interest of what so ever to repeat this work for HttpClient 3.1. Oleg > - Igor > > > -----Original Message----- > From: Oleg Kalnichevski [mailto:olegk@apache.org] > Sent: Thursday, April 12, 2007 6:01 PM > To: HttpClient User Discussion > Subject: Re: Performance issues in ChunkedInputStream > > Igor Lubashev wrote: > > 1. BufferedInputStream is working fine. I've looked at the source, > and > > it correctly tried to read data only when its internal buffer is > > exhausted. Most read calls reference only the internal buffer. When > > the data does get read from the underlying stream, it tries to read it > > in large chunks. (Of course, if the underlying stream returns very > > little data, it is a different problem.) > > > > 2. It is hard to believe that reading a byte at a time is a > bottleneck, > > but I've just quickly written a LineReaderInputStream, which is > derived > > from BufferedInputStream, so all the searching for CRLF/LF happens > very > > quickly internally. The source is attached. > > > > Just call readLine() method, and you'll get Strings out of the stream. > > You can interleave all regular stream operations and readLine() calls. > > However, if you wish to use readLine() *after* using the stream's > read() > > methods, make sure that you do not inadvertently pass this stream to > > anything that is buffering the stream's data (or your strings may get > > consumed via buffering). > > > > - Igor > > > > > > > > Igor, > > With all due respect given the implementation of > BufferedInputStream#read() method in Sun's JRE (see below) I just do not > > see how LineReaderInputStream should be any faster > > public synchronized int read() throws IOException { > if (pos >= count) { > fill(); > if (pos >= count) > return -1; > } > return getBufIfOpen()[pos++] & 0xff; > } > > Have you done any benchmarking comparing performance of HttpClient 3.x > with and without the patch? > > I have invested a lot of efforts into optimizing the low level HTTP > components for HttpClient 4.0 [1] and most performance gains came > chiefly from three factors: elimination of unnecessary synchronization > and intermediate buffer copying and reduced garbage (thus reduced GC > time). Performance improvement due to the improved HTTP header parser > and chunk codec were marginal at best. > > Oleg > > [1] http://jakarta.apache.org/httpcomponents/httpcore/index.html > > > > > >>>> I looked at the source for BufferedInputStream and it looks like > >>>> it tries to fill the empty space in the buffer each time you read > >>>> > > from > > > >>> it (for a socket connection it will read more than one packet of > >>> > > data) > > > >>>> instead of just doing a single read from the underlying stream. > >>>> > >>>> > >>> Ok, then the byte-by-byte reading in CIS when parsing the chunk > >>> > > header > > > >>> might well be the problem. If you want to fix that, you'll have to > >>> > > hack > > > >>> deeply into CIS. Here is what I would do if I had no other choice: > >>> > >>> - extend CIS by a local byte array as a buffer (needs two extra int > >>> for cursor and fill size) > >>> - change the chunk header parsing to read a bunch of bytes into the > >>> buffer, then parsing from there > >>> - change all read methods to return leftover bytes from the buffer > >>> before calling a read on the underlying stream > >>> > >>> hope that helps, > >>> Roland > >>> > >>> > >> Tony and Roland, > >> > >> I suspect rather strongly it is BufferedInputStream that needs > fixing, > >> not ChunkedInputStream > >> > >> Oleg > >> > > > > > > > > > ------------------------------------------------------------------------ > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org > > For additional commands, e-mail: > httpclient-user-help@jakarta.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: httpclient-user-help@jakarta.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: httpclient-user-help@jakarta.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: httpclient-user-help@jakarta.apache.org