commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ronald Klop" <ronald-freeb...@klop.yi.org>
Subject Re: FileUpload: Failure to parse non-ascii character sets
Date Tue, 21 Jun 2005 00:26:42 GMT
On Tue, 21 Jun 2005 01:10:06 +0200, Brent Kynaston <bkynaston@trivir.com>  
wrote:

> Ron,
>
> Below you'll find the headers for two separate tests.  The first test  
> does not user the PortletFileUpload class, it simply uses a standard  
> form with DocTitle data (and other form data) coming in as  
> x-www-form-urlencoded.  Note the values after "ztest3".  These values  
> are Russian Cyrillic characters that were posted successfully and stored  
> in a database.
>
> In the second test, we used the PortletFileUpload API to parse the  
> multipart/form-data.  Here we insert ztest5 into the DocTitle field,  
> followed by Cyrillic characters again.  This time however, we to not  
> receive the proper DocTitle value from the FileUpload parser.
>
> Here is the header and data from the first test:
> *----------------------------------------------
> No.     Time        Source                Destination           Protocol  
> Info
>     332 22.560146   192.168.189.1         192.168.189.201       HTTP      
> POST  
> /GLPNetPortal/portal/portlet/COPDocuments?urlType=Action&novl-inst=c373e902f9802206764b000c296f1d50&wsrp-mode=view&wsrp-windowstate=normal&action=updateDoc&DocID=c373e90485927c3d323c000c29ccbaff
 
> HTTP/1.1 (application/x-www-form-urlencoded)
>
> Frame 332 (1035 bytes on wire, 1035 bytes captured)
> Ethernet II, Src: 00:50:56:c0:00:08, Dst: 00:0c:29:cc:ba:ff
> Internet Protocol, Src Addr: 192.168.189.1 (192.168.189.1), Dst Addr:  
> 192.168.189.201 (192.168.189.201)
> Transmission Control Protocol, Src Port: 3293 (3293), Dst Port: http  
> (80), Seq: 1, Ack: 1, Len: 981
> Hypertext Transfer Protocol
>     POST  
> /GLPNetPortal/portal/portlet/COPDocuments?urlType=Action&novl-inst=c373e902f9802206764b000c296f1d50&wsrp-mode=view&wsrp-windowstate=normal&action=updateDoc&DocID=c373e90485927c3d323c000c29ccbaff
 
> HTTP/1.1\r\n
>         Request Method: POST
>         Request URI:  
> /GLPNetPortal/portal/portlet/COPDocuments?urlType=Action&novl-inst=c373e902f9802206764b000c296f1d50&wsrp-mode=view&wsrp-windowstate=normal&action=updateDoc&DocID=c373e90485927c3d323c000c29ccbaff
>         Request Version: HTTP/1.1
>     Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,  
> application/x-shockwave-flash, */*\r\n
>     Referer:  
> http://192.168.189.201/GLPNetPortal/portal/portlet/COPDocuments?novl-inst=c373e902f9802206764b000c296f1d50\r\n
>     Accept-Language: en-us,ru;q=0.5\r\n
>     Content-Type: application/x-www-form-urlencoded\r\n
>     Accept-Encoding: gzip, deflate\r\n
>     User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;  
> .NET CLR 1.1.4322)\r\n
>     Host: 192.168.189.201\r\n
>     Content-Length: 207\r\n
>     Connection: Keep-Alive\r\n
>     Cache-Control: no-cache\r\n
>     Cookie: JSESSIONID=994b099daff9dcf29b1956bfa65f9a57\r\n
>     \r\n
> Line-based text data: application/x-www-form-urlencoded
>     DocTitle=ztest3%D1%84%D1%8B%D0%B2%D0%B0%D1%84%D1%8B%D0%B2%D0%B0&fileData=&DocDesc=Please+add+a+description&viewable=on&peer-review=on&DocID=c373e90485927c3d323c000c29ccbaff&charset=UTF-8&updateWebLink=Submit
> *----------------------------------------------
>
> Here is the header and data for the second test (using  
> multipart/form-data):
> *----------------------------------------------
> No.     Time        Source                Destination           Protocol  
> Info
>    1362 466.405031  192.168.189.1         192.168.189.201       HTTP      
> POST  
> /GLPNetPortal/portal/portlet/COPDocuments?urlType=Action&novl-inst=c373e902f9802206764b000c296f1d50&wsrp-mode=view&wsrp-windowstate=normal&action=updateDoc&DocID=c373e90485927c3d323c000c29ccbaff
 
> HTTP/1.1
>
> Frame 1362 (866 bytes on wire, 866 bytes captured)
> Ethernet II, Src: 00:50:56:c0:00:08, Dst: 00:0c:29:cc:ba:ff
> Internet Protocol, Src Addr: 192.168.189.1 (192.168.189.1), Dst Addr:  
> 192.168.189.201 (192.168.189.201)
> Transmission Control Protocol, Src Port: 3533 (3533), Dst Port: http  
> (80), Seq: 1, Ack: 1, Len: 812
> Hypertext Transfer Protocol
>     POST  
> /GLPNetPortal/portal/portlet/COPDocuments?urlType=Action&novl-inst=c373e902f9802206764b000c296f1d50&wsrp-mode=view&wsrp-windowstate=normal&action=updateDoc&DocID=c373e90485927c3d323c000c29ccbaff
 
> HTTP/1.1\r\n
>         Request Method: POST
>         Request URI:  
> /GLPNetPortal/portal/portlet/COPDocuments?urlType=Action&novl-inst=c373e902f9802206764b000c296f1d50&wsrp-mode=view&wsrp-windowstate=normal&action=updateDoc&DocID=c373e90485927c3d323c000c29ccbaff
>         Request Version: HTTP/1.1
>     Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,  
> application/x-shockwave-flash, */*\r\n
>     Referer:  
> http://192.168.189.201/GLPNetPortal/portal/portlet/COPDocuments?novl-inst=c373e902f9802206764b000c296f1d50\r\n
>     Accept-Language: en-us,ru;q=0.5\r\n
>     Content-Type: multipart/form-data;  
> boundary=---------------------------7d53891713065c\r\n
>     Accept-Encoding: gzip, deflate\r\n
>     User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;  
> .NET CLR 1.1.4322)\r\n
>     Host: 192.168.189.201\r\n
>     Content-Length: 991\r\n
>     Connection: Keep-Alive\r\n
>     Cache-Control: no-cache\r\n
>     Cookie: JSESSIONID=994b099daff9dcf29b1956bfa65f9a57\r\n
>     \r\n
>
> No.     Time        Source                Destination           Protocol  
> Info
>    1363 466.405043  192.168.189.1         192.168.189.201       HTTP      
> Continuation or non-HTTP traffic
>
> Frame 1363 (1045 bytes on wire, 1045 bytes captured)
> Ethernet II, Src: 00:50:56:c0:00:08, Dst: 00:0c:29:cc:ba:ff
> Internet Protocol, Src Addr: 192.168.189.1 (192.168.189.1), Dst Addr:  
> 192.168.189.201 (192.168.189.201)
> Transmission Control Protocol, Src Port: 3533 (3533), Dst Port: http  
> (80), Seq: 813, Ack: 1, Len: 991
> Hypertext Transfer Protocol
>     Data (991 bytes)
>
> 0000  2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d   ----------------
> 0010  2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 37 64 35   -------------7d5
> 0020  33 38 39 31 37 31 33 30 36 35 63 0d 0a 43 6f 6e   3891713065c..Con
> 0030  74 65 6e 74 2d 44 69 73 70 6f 73 69 74 69 6f 6e   tent-Disposition
> 0040  3a 20 66 6f 72 6d 2d 64 61 74 61 3b 20 6e 61 6d   : form-data; nam
> 0050  65 3d 22 44 6f 63 54 69 74 6c 65 22 0d 0a 0d 0a   e="DocTitle"....
> 0060  7a 74 65 73 74 35 d1 84 d1 8b d0 b2 d0 b0 d1 84   ztest5..........
> 0070  d0 b2 d1 8b d0 b0 d1 84 d1 8b d0 b2 d0 b0 0d 0a   ................
> 0080  2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d   ----------------
> 0090  2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 37 64 35   -------------7d5
> 00a0  33 38 39 31 37 31 33 30 36 35 63 0d 0a 43 6f 6e   3891713065c..Con
> 00b0  74 65 6e 74 2d 44 69 73 70 6f 73 69 74 69 6f 6e   tent-Disposition
> 00c0  3a 20 66 6f 72 6d 2d 64 61 74 61 3b 20 6e 61 6d   : form-data; nam
> 00d0  65 3d 22 66 69 6c 65 44 61 74 61 22 3b 20 66 69   e="fileData"; fi
> 00e0  6c 65 6e 61 6d 65 3d 22 22 0d 0a 43 6f 6e 74 65   lename=""..Conte
> 00f0  6e 74 2d 54 79 70 65 3a 20 61 70 70 6c 69 63 61   nt-Type: applica
> 0100  74 69 6f 6e 2f 6f 63 74 65 74 2d 73 74 72 65 61   tion/octet-strea
> 0110  6d 0d 0a 0d 0a 0d 0a 2d 2d 2d 2d 2d 2d 2d 2d 2d   m......---------
> 0120  2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d   ----------------
> 0130  2d 2d 2d 2d 37 64 35 33 38 39 31 37 31 33 30 36   ----7d5389171306
> 0140  35 63 0d 0a 43 6f 6e 74 65 6e 74 2d 44 69 73 70   5c..Content-Disp
> 0150  6f 73 69 74 69 6f 6e 3a 20 66 6f 72 6d 2d 64 61   osition: form-da
> 0160  74 61 3b 20 6e 61 6d 65 3d 22 44 6f 63 44 65 73   ta; name="DocDes
> 0170  63 22 0d 0a 0d 0a 50 6c 65 61 73 65 20 61 64 64   c"....Please add
> 0180  20 61 20 64 65 73 63 72 69 70 74 69 6f 6e 0d 0a    a description..
> 0190  2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d   ----------------
> 01a0  2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 37 64 35   -------------7d5
> 01b0  33 38 39 31 37 31 33 30 36 35 63 0d 0a 43 6f 6e   3891713065c..Con
> 01c0  74 65 6e 74 2d 44 69 73 70 6f 73 69 74 69 6f 6e   tent-Disposition
> 01d0  3a 20 66 6f 72 6d 2d 64 61 74 61 3b 20 6e 61 6d   : form-data; nam
> 01e0  65 3d 22 76 69 65 77 61 62 6c 65 22 0d 0a 0d 0a   e="viewable"....
> 01f0  6f 6e 0d 0a 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d   on..------------
> 0200  2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d   ----------------
> 0210  2d 37 64 35 33 38 39 31 37 31 33 30 36 35 63 0d   -7d53891713065c.
> 0220  0a 43 6f 6e 74 65 6e 74 2d 44 69 73 70 6f 73 69   .Content-Disposi
> 0230  74 69 6f 6e 3a 20 66 6f 72 6d 2d 64 61 74 61 3b   tion: form-data;
> 0240  20 6e 61 6d 65 3d 22 70 65 65 72 2d 72 65 76 69    name="peer-revi
> 0250  65 77 22 0d 0a 0d 0a 6f 6e 0d 0a 2d 2d 2d 2d 2d   ew"....on..-----
> 0260  2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d   ----------------
> 0270  2d 2d 2d 2d 2d 2d 2d 2d 37 64 35 33 38 39 31 37   --------7d538917
> 0280  31 33 30 36 35 63 0d 0a 43 6f 6e 74 65 6e 74 2d   13065c..Content-
> 0290  44 69 73 70 6f 73 69 74 69 6f 6e 3a 20 66 6f 72   Disposition: for
> 02a0  6d 2d 64 61 74 61 3b 20 6e 61 6d 65 3d 22 44 6f   m-data; name="Do
> 02b0  63 49 44 22 0d 0a 0d 0a 63 33 37 33 65 39 30 34   cID"....c373e904
> 02c0  38 35 39 32 37 63 33 64 33 32 33 63 30 30 30 63   85927c3d323c000c
> 02d0  32 39 63 63 62 61 66 66 0d 0a 2d 2d 2d 2d 2d 2d   29ccbaff..------
> 02e0  2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d   ----------------
> 02f0  2d 2d 2d 2d 2d 2d 2d 37 64 35 33 38 39 31 37 31   -------7d5389171
> 0300  33 30 36 35 63 0d 0a 43 6f 6e 74 65 6e 74 2d 44   3065c..Content-D
> 0310  69 73 70 6f 73 69 74 69 6f 6e 3a 20 66 6f 72 6d   isposition: form
> 0320  2d 64 61 74 61 3b 20 6e 61 6d 65 3d 22 64 6f 63   -data; name="doc
> 0330  46 69 6c 65 4e 61 6d 65 22 0d 0a 0d 0a 4b 45 59   FileName"....KEY
> 0340  53 0d 0a 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d   S..-------------
> 0350  2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d   ----------------
> 0360  37 64 35 33 38 39 31 37 31 33 30 36 35 63 0d 0a   7d53891713065c..
> 0370  43 6f 6e 74 65 6e 74 2d 44 69 73 70 6f 73 69 74   Content-Disposit
> 0380  69 6f 6e 3a 20 66 6f 72 6d 2d 64 61 74 61 3b 20   ion: form-data;
> 0390  6e 61 6d 65 3d 22 75 70 64 61 74 65 57 65 62 4c   name="updateWebL
> 03a0  69 6e 6b 22 0d 0a 0d 0a 53 75 62 6d 69 74 0d 0a   ink"....Submit..
> 03b0  2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d   ----------------
> 03c0  2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 37 64 35   -------------7d5
> 03d0  33 38 39 31 37 31 33 30 36 35 63 2d 2d 0d 0a      3891713065c--..
> *----------------------------------------------
>
> Thanks,
>
> Brent
>
>>>> ronald-freebsd8@klop.yi.org 6/20/2005 6:36:33 PM >>>
> On Tue, 21 Jun 2005 00:24:52 +0200, Brent Kynaston <bkynaston@trivir.com>
> wrote:
>
>> Ronald,
>>
>> Thanks for the quick response.
>>
>> Here is the HTTP header (captured by Ethereal) from the post where I've
>> inserted some Finnish data for one of the fields:
>>
>> *-----------------------------------------------------
>> Frame 161 (856 bytes on wire, 856 bytes captured)
>> Ethernet II, Src: 00:50:56:c0:00:08, Dst: 00:0c:29:cc:ba:ff
>> Internet Protocol, Src Addr: 192.168.189.1 (192.168.189.1), Dst Addr:
>> 192.168.189.201 (192.168.189.201)
>> Transmission Control Protocol, Src Port: 2631 (2631), Dst Port: http
>> (80), Seq: 1, Ack: 1, Len: 802
>> Hypertext Transfer Protocol
>>     POST
>> /GLPNetPortal/portal/portlet/COPDocuments?urlType=Action&novl-inst=c373e902f9802206764b000c296f1d50&wsrp-mode=view&wsrp-windowstate=normal&action=updateDoc&DocID=c373e90485927c3d323c000c29ccbaff
>> HTTP/1.1\r\n
>>         Request Method: POST
>>         Request URI:
>> /GLPNetPortal/portal/portlet/COPDocuments?urlType=Action&novl-inst=c373e902f9802206764b000c296f1d50&wsrp-mode=view&wsrp-windowstate=normal&action=updateDoc&DocID=c373e90485927c3d323c000c29ccbaff
>>         Request Version: HTTP/1.1
>>     Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
>> application/x-shockwave-flash, */*\r\n
>>     Referer:
>> http://192.168.189.201/GLPNetPortal/portal/portlet/COPDocuments?novl-inst=c373e902f9802206764b000c296f1d50\r\n
>>     Accept-Language: en-us\r\n
>>     Content-Type: multipart/form-data;
>> boundary=---------------------------7d522e2ec0e8a\r\n
>>     Accept-Encoding: gzip, deflate\r\n
>>     User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;
>> .NET CLR 1.1.4322)\r\n
>>     Host: 192.168.189.201\r\n
>>     Content-Length: 968\r\n
>>     Connection: Keep-Alive\r\n
>>     Cache-Control: no-cache\r\n
>>     Cookie: JSESSIONID=aa88969a240158baa93362f89c55e4f3\r\n
>>     \r\n
>> *-----------------------------------------------------
>>
>> Thanks,
>>
>> Brent
>>
>>>>> ronald-freebsd8@klop.yi.org 6/20/2005 5:31:19 PM >>>
>> On Mon, 20 Jun 2005 22:29:34 +0200, Brent Kynaston  
>> <bkynaston@trivir.com>
>> wrote:
>>
>>> I'm trying to post a multi-part form with file data and text input
>>> files.  The Portlet FileUpload code is able to successfully parse the
>>> file data and text fields, except for when I change my keyboard type to
>>> Finnish, Arabic, or any foreign language for that matter.
>>>
>>> I've specified an http meta-equiv with UTF-8:
>>> META http-equiv="Content-Type" content="text/html; charset=UTF-8
>>>
>>> I've tried setting the PortletFileUpload class instance to various
>>> encoding types, and have not been able to get it to work.  Is this
>>> broken in the current builds of commons-fileupload-1.1-dev.jar?
>>
>> Post a dump of the headers going over the wire. (See ngrep, ethereal or
>> another network sniffer.)
>
> A multipart/form-data post contains more headers in the body in the
> request. Those are the interesting ones.
> It's best seen with no file or a very small file upload.


Did you try FileItem.getString(String encoding)? getString("UTF-8") in  
this case.


-- 
  Ronald Klop
  Amsterdam, The Netherlands

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Mime
View raw message