commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Priest <Robert.Pri...@bentley.com>
Subject RE: [FileUpload] Unicode Encoding for a Form
Date Fri, 19 Sep 2003 12:36:46 GMT
Thanks Paul.

When I get a chance, I try your code. I am swamped with other work that is
taking precedent right now.

-----Original Message-----
From: Paul Libbrecht [mailto:paul@activemath.org]
Sent: Wednesday, September 17, 2003 5:07 PM
To: Jakarta Commons Users List
Subject: Re: [FileUpload] Unicode Encoding for a Form



Hi Robert,

I presume you're a victim of the same syndrom as we had.
I have written this somewhere already but... I can't find it anymore, 
here is the issue·

- content-encoding header only allows value something like form-data... 
nothing meaning encoding of the characters in here, in particular how to 
convert the unicode character &#xF2; to some %xx value... (making it %F2 
would mean using iso-8859-1).
- what can browsers do ? either ask the user (some browsers have this in 
preferences) or just use the same encoding as received, this is 
generally the wise choice...
- what can sever-containers do ? Well... they don't know, they have no 
clue what was the browser-page all this was coming from... so they just 
convert the bytes to a string matching %F2 to &#xF2; hence giving very 
weird result if UTF-8 is used...

We do all in UTF-8, russian, french, and math characters were our interest.
Our solution came as follows, once we had guessed that into Tomcat: 
write a little converter that contains an InputStreamReader(pig,"UTF-8") 
and read from there with pig defined to be something like a 
ByteArrayInputStream(request.getParam("xx").getBytes()).

Since then, we're happy.
But one day, one should file a bug on the HTML specification...

Hope that helps.

Paul



Robert Priest wrote:
> and the following does not help:
> 
>  try
>   {
>   fileName = new String(cd.substring(start + 10,
> end).trim().getBytes("UTF-8"));
>   }
>  catch (java.io.UnsupportedEncodingException uee)
>   {
>   }
> 
> -----Original Message-----
> From: Robert Priest [mailto:Robert.Priest@bentley.com]
> Sent: Wednesday, September 17, 2003 11:19 AM
> To: 'commons-user@jakarta.apache.org'
> Subject: [FileUpload] Unicode Encoding for a Form
> 
> 
> Hello all,
> 
> I have a simple html form which has an <INPUT TYPE="FILE"/> field in it.
> 
> Now when I select a file that contains Scandanavian characters (such as
> umlauts) it is not being URL encoded properly before being sent. As a
> result,  my jsp page which accepts posts of files via the FileUpload
package
> is not interpreting the file name correctly.
> 
> Has anyone seen this problem, first? And does anyone have a solution for
> this issue?
> 
> 
> For example, if I select a file say:
> 
> filename="C:\Documents and Settings\Robert.Priest\Desktop\äää.txt"
> 
> what is sent in the request is:
> 
> C:\Documents and Settings\Robert.Priest\Desktop\???.txt"
> 
> 
> and what is seen by if you do a FileItem.getName() is:
> 
> C:\Documents and Settings\Robert.Priest\Desktop\???.txt
> 
> 
> So the method FileUploadBase.getFileName(Map /* String, String */ headers)
> does not see the correct filename when it executes: 
> 
>  if (start != -1 && end != -1)
>             {
>                 fileName = cd.substring(start + 10, end).trim();
>             }
> 
> 
> The following is the multipart requests that IE sends using such a file
> (with umlauts) in the name:
> ------------------------------
> 
> 
> POST /jsp/upload.jsp HTTP/1.1
> Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
> application/vnd.ms-
> powerpoint, application/vnd.ms-excel, application/msword,
> application/x-shockwav
> e-flash, */*
> Referer: http://localhost:8080/roberttest/rptest.html
> Accept-Language: en-us
> Content-Type: multipart/form-data;
> boundary=---------------------------7d39eb580
> 29a
> Accept-Encoding: gzip, deflate
> User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
> Host: localhost:2000
> Content-Length: 349
> Connection: Keep-Alive
> Cache-Control: no-cache



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Mime
View raw message