hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dhruvakumar P G <dhruvakumar....@oracle.com>
Subject Re: Content-Encoding header is missing in httpclient's response
Date Tue, 17 Dec 2013 06:46:24 GMT

On 12/9/2013 7:43 PM, Oleg Kalnichevski wrote:
> On Mon, 2013-12-09 at 19:15 +0530, Dhruvakumar P G wrote:
>> On 12/9/2013 4:41 PM, Oleg Kalnichevski wrote:
>>> On Mon, 2013-12-09 at 13:09 +0530, Dhruvakumar P G wrote:
>>>> Hello,
>>>>
>>>> I'm in the middle of upgrading Httpclient, mime, core libraries to
>>>> latest version. I haven't been able to figure out any solution to the
>>>> following problem.
>>>> When Httpclient downloads a text file(icité Àâqë-withmultibytechars.txt)
>>>> which contains multibyte characters from another server and sends it to
>>>> the browser.
>>>> *The server returns the response headers as below :*
>>>>
>>>> HTTP/1.1 200 OK
>>>> X-Powered-By: Servlet/2.5
>>>> Content-Disposition: attachment;       filename="icité
>>>> Àâqë-withmultibytechars.txt"
>>>> Content-Type: application/octet-stream
>>>> Content-Length: 162
>>>> *
>>>> **Browser receives the headers as below and shows the filename rightly :*
>>>>
>>>> Content-Disposition    attachment; filename="icité
>>>> Àâqë-withmultibytechars.txt"
>>>> Content-Type    application/octet-stream
>>>> Transfer-Encoding    chunked
>>>>
>>>> When Httpclient downloads an image file(ウェ.jpg) from another server
>>>> and sends it to the browser.
>>>> *The server returns the response headers as below : *
>>>> HTTP/1.1 200 OK
>>>> X-Powered-By: Servlet/2.5
>>>> Content-Disposition: attachment; filename="ウェ.jpg"
>>>> Content-Encoding: gzip
>>>> Content-Type: application/octet-stream
>>>> Transfer-Encoding: chunked
>>>>
>>>> Even though  "Content-Encoding: gzip" header is returned by the server,
>>>> the response object doesn't have this header.
>>>> Somehow this header has been removed from the response when the request
>>>> gets executed,  _response = _httpClient.execute(_httpHost, _httpMethod,
>>>> _httpContext);
>>>>
>>>> *Browser will not receive this header, non-ascii characters aren't
>>>> recognized in the filename of download dialogue, it just shows empty
>>>> characters:*
>>>> Content-Disposition    attachment; filename="   .jpg"
>>>> Content-Type    application/octet-stream
>>>> Transfer-Encoding    chunked, chunked
>>>>
>>>> Am I missing something here ? How do I make sure that the Httpclient
>>>> doesn't ignore this header and browser get to show the filename rightly ?
>>>>
>>> HTTP message headers may not have non-ASCII per requirements of the HTTP
>>> protocol. The target server is in violation of the HTTP specification.
>> Yes indeed,  the target server should return encoded filename :
>> *Content-disposition: attachment; filename="=?utf-8?B?44Km44KnLmpwZw==?="*
>> But instead it is returning unencoded filename : Content-Disposition:
>> attachment; filename="ウェ.jpg"
>> Can't I resolve my issue unless target server returns encoded filename ?
>>
>> Thanks,
>> Dhruva
>>> One can force HttpClient, though, to use a non-standard charset for HTTP
>>> messages by using a custom ConnectionConfig.
>>>
>>> Oleg
>>>
>> I have set the charset to UTF-8,
>> connectionConfigBuilder.setCharset(Consts.UTF_8)
>> Will Setting charset to any other make httpclient to not to lose
>> 'Content-Encoding' response header ?
>>
> I am not aware of a single confirmed case of HttpClient losing headers.
> You can use wire / context logging to see what data packets are
> transmitted across the wire.
>
> Oleg
Hello,
To narrow down the problem, I have disabled the compression in target 
server. Now the target server doesn't return Content-Encoding header.

Given that the target server always returns Non-ASCII filename without 
being encoded in MIME header(Content-Disposition: attachment; filename=" 
ウェ.jpg") which is a violation to the HTTP specification. My 
requirement here is to show the multibyte character file name when user 
downloads the attachment across all the browsers without losing any 
character in the name.

With earlier version of HttpClient(4.0.1), when target server returns 
the non-ascii filename without being encoded as below :
Content-Disposition: attachment; filename="ウェ - multibyte.txt"
Content-Type: text/plain;charset=utf-8

Filename will be kind of encoded in the response of HttpClient(4.0.1) as 
below :
Content-Disposition    attachment; filename="ウェ - multibyte.txt"
Content-Type    text/plain;charset=utf-8

And as a result of the above behaviour, browser is able to decode the 
filename and show correctly in the download dialogue.

But in the response of HttpClient(4.3.1), filename will be exactly same 
as what we got from target server. Not changed into any encoded form 
unlike in HttpClient(4.0.1) :
Content-Disposition: attachment; filename="ウェ - multibyte.txt"
Content-Type: text/plain;charset=utf-8

And as a result of the above behaviour, browser is not able to show the 
filename rightly. Download dialogue shows *'- multibyte.txt*' and
response headers in Firebug shows:
Content-Disposition 	|attachment; filename=" - multibyte.txt"|
Content-Type 	|text/plain;charset=utf-8|



Is the above change-in-behaviour from 4.0 to 4.3 expected ?
If so,*How do I make sure that the multibyte character filename is 
displayed correctly across all the browsers given that the target server 
always returns it in unencoded form* ?


Thanks & Regards,
Dhruva
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> For additional commands, e-mail: dev-help@hc.apache.org
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message