hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yihua Huang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HTTPCLIENT-1432) Lazy decompressing of HttpEntity.getContent()
Date Tue, 05 Nov 2013 16:14:17 GMT
Yihua Huang created HTTPCLIENT-1432:

             Summary: Lazy decompressing of HttpEntity.getContent()
                 Key: HTTPCLIENT-1432
                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1432
             Project: HttpComponents HttpClient
          Issue Type: Improvement
          Components: HttpClient
    Affects Versions: 4.3.1, 4.3.2
            Reporter: Yihua Huang
            Priority: Minor

In 4.3, DecompressingEntity is used for decompressing entity of http response. When we call
DecompressingEntity.getContent(), an new DeflateInputStream or GZIPInputStream will be created,
and the header of compressing part will be read and checked. 

       InputStream decorate(final InputStream wrapped) throws IOException {
        return new GZIPInputStream(wrapped);

In some cases, we don't really need to decompress it. For example, in "http://baike.baidu.com/search/word?word=httpclient&pic=1&sug=1&enc=utf8"
the response state code is 302, it contains header "Content-Encoding:gzip" but without any
entity data (It occurs sometimes). In RedirectExec.execute(), we don't read the entity, but
in the end, it try to close inputstream by EntityUtils.consume(response.getEntity()). When
we call entity.getContent() in EntityUtils.consume(response.getEntity()), an EOFException
will be thrown and the redirect can not continue. 

In this case, we don't care about the real entity -- even if the compress format is not right.

In my opinion, the format should be created and checked ONLY when we need to read the content
but not just when closing it. So I wrote LazyDecompressingInputStream as a wrapper and create
the DecompressingStream until read() method is called. Then more website will be supported.


This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org

View raw message