hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ortwin Gl├╝ck <...@odi.ch>
Subject Re: [HTTPClient 3.0.1] Bug: Multipart posts with files named using UTF-8 characters
Date Thu, 19 Oct 2006 12:38:07 GMT
Przemek,

The filename is sent in a header of the MIME part. As I interprete the 
HTTP specs headers are restricted to the ASCII character set. Since this 
is not a real HTTP header but rather a header-like structure in the HTTP 
message body chances are that we misinterpreted the specs.

I haven't tried myself. What happens when you send the form blow with a 
browser? Does it set the Content-Encoding header of the HTTP message to 
UTF-8? Because the server must have a way to know in which encoding to 
interprete the MIME header if it's not the default ASCII.

Ortwin

Tumidajewicz, Przemyslaw wrote:
> Hello everyone,
> 
> First post here, hope I'm doing it right ;)
> 
> I've been having problems with sending multipart posts containing files 
> named using UTF-8 characters - all non-ASCII characters are turned into 
> question marks. I've tried to specify the charset when creating the 
> FilePart like this
> 
> FilePart fp = new FilePart(name, file, null, "UTF-8");
> 
> as well as setting the charset later on like this
> 
> fp.setCharSet("UTF-8");
> 
> with no result. So I took a deeper look at the HttpClient code (thank 
> god for open source!) and found that the loss of special characters 
> happens in the FilePart.sendDispositionHeader method, at line
> 
> out.write(EncodingUtil.getAsciiBytes(filename));
> 
> where the filename is forced to fit into the US-ASCII charset.
> 
> My workaround for this problem is to substitute the above line with a 
> charset-aware version:
> 
> out.write(EncodingUtil.getBytes(filename, getCharSet()));
> 
> but I'm not sure if it's the correct way to do it.
> 
> What I'm quite sure of at this point is that it works for UTF-8 and 
> results are consistent with what I get out of IE6 when posting the same 
> file using a form like this:
> 
> <form action="http://localhost:1235" method="POST" 
> enctype="multipart/form-data" accept-charset="UTF-8">
> <input type="file" name="file"></input>
> <input type="submit"></input>
> </form>
> 
> It's also parsed correctly by FileUpload 1.1.
> 
> I've had a look at the HTTPClient 3.1-alpha1 source and the problematic 
> line in FilePart looks the same - which means that either my fix is a 
> heresy and/or there is a better way of doing this - or that this bug has 
> not been reported before (I failed to find anything on this in the 
> archive).
> 
> Please let me know if this is the right way of fixing this problem and 
> if so, will this fix make it into HTTPClient 3.1
> 
> Thanks and best regards!
> --Przemek
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: httpclient-dev-help@jakarta.apache.org
> 

-- 
[web]  http://www.odi.ch/
[blog] http://www.odi.ch/weblog/
[pgp]  key 0x81CF3416
        finger print F2B1 B21F F056 D53E 5D79 A5AF 02BE 70F5 81CF 3416

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-dev-help@jakarta.apache.org


Mime
View raw message