Return-Path: X-Original-To: apmail-hc-dev-archive@www.apache.org Delivered-To: apmail-hc-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8736A1092F for ; Sat, 15 Feb 2014 16:23:35 +0000 (UTC) Received: (qmail 82326 invoked by uid 500); 15 Feb 2014 16:23:34 -0000 Delivered-To: apmail-hc-dev-archive@hc.apache.org Received: (qmail 81925 invoked by uid 500); 15 Feb 2014 16:23:24 -0000 Mailing-List: contact dev-help@hc.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "HttpComponents Project" Delivered-To: mailing list dev@hc.apache.org Received: (qmail 81475 invoked by uid 99); 15 Feb 2014 16:23:20 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Feb 2014 16:23:20 +0000 Date: Sat, 15 Feb 2014 16:23:20 +0000 (UTC) From: "Sebastiano Vigna (JIRA)" To: dev@hc.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HTTPCLIENT-1461) GZIP decoding is very slow MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Sebastiano Vigna created HTTPCLIENT-1461: -------------------------------------------- Summary: GZIP decoding is very slow Key: HTTPCLIENT-1461 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1461 Project: HttpComponents HttpClient Issue Type: Bug Components: HttpClient Affects Versions: 4.3.2 Reporter: Sebastiano Vigna Priority: Critical In 4.3.1, LazyDecompressingInputStream was introduced. However, LazyDecompressingInputStream subclasses InputStream without overriding the multi-byte read() method, and the inherited method does a byte-by-byte read. This is trace showing what happens: java.util.zip.Inflater.inflateBytes(Inflater.java:Unknown line) java.util.zip.Inflater.inflate(Inflater.java:259) java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152) java.util.zip.GZIPInputStream.read(GZIPInputStream.java:116) java.util.zip.InflaterInputStream.read(InflaterInputStream.java:122) org.apache.http.client.entity.LazyDecompressingInputStream.read(LazyDecompressingInputStream.java:56) java.io.InputStream.read(InputStream.java:179) it.unimi.di.law.warc.util.InspectableCachedHttpEntity.copyContent(InspectableCachedHttpEntity.java:67) copyContent() would love to read(byte[],int,int) in a buffer, but since LazyDecompressingInputStream doesn't override it it invokes instead the read-byte-by-byte inherited method in InputStream, which in turn now calls for each byte the one-byte read() method from LazyDecompressingInputStream, which invokes the one-byte read method from InflaterInputStream, which does a multi-byte, length-one read from GZIPInputStream, which unleashes a similar call on InflaterInputStream, which unfortunately makes a similar read using the native inflateBytes() method. Thus, for each byte there is a native-method call. The result is a 10-50x increase in CPU usage, which turns into a 10x-50x decrease in speed if, as in our case, you have 7000 threads downloading in parallel. Overriding read(byte[],int,int) in LazyDecompressingInputStream will solve the problem: @Override public int read(byte[] b, int off, int len) throws IOException { initWrapper(); return wrapperStream.read(b, off, len); } -- This message was sent by Atlassian JIRA (v6.1.5#6160) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org For additional commands, e-mail: dev-help@hc.apache.org