commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jaime Hablutzel Egoavil <hablutz...@gmail.com>
Subject Wrong processing for encoding of Content-disposition 'filename' parameter
Date Fri, 08 Mar 2013 15:38:50 GMT
I'm looking that commons fileupload uses a 'headerEncoding' variable which
Javadoc explanes:

Specifies the character encoding to be used when reading the headers of
> individual part. When not specified, or null, the request encoding is used.
> If that is also not specified, or null, the platform default encoding is
> used.


Well, this headerEncoding is responsible for decoding the 'filename'
parameter value too, but I can see that rfc1867 (the RFC you implement)
says:

 The original local file name may be supplied as well, either as a
>    'filename' parameter either of the 'content-disposition: form-data'
>    header or in the case of multiple files in a 'content-disposition:
>    file' header of the subpart. The client application should make best
>    effort to supply the file name; if the file name of the client's
>    operating system is not in US-ASCII, the file name might be
>    approximated or encoded using the method of RFC 1522.  This is a
>    convenience for those cases where, for example, the uploaded files
>    might contain references to each other, e.g., a TeX file and its .sty
>    auxiliary style description.


So the filename parameter value should not be decoded without any mechanism
but US-ASCII  or the method described in RFC 1522 (encoded words), but you
just decode it with a custom 'headerEncoding'. So please any clarification
would be useful to me, why are you processing headers like that?

Take a look at this issue too and see if you can reopen it:

https://issues.apache.org/jira/browse/FILEUPLOAD-56#comment-13597224


PS: Chrome and Firefox browsers doesn't seem to follow this spec neither as
they encode 'filename' parameter (from a multipart/form-data) with the page
encoding (or form accept-charset), thus producing headers with raw UTF-8 or
any encoding choosen for the page in some cases.



-- 
Jaime Hablutzel -  RPC 987608463

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message