axis-java-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davanum Srinivas <d...@yahoo.com>
Subject Re: REPOST: Character encoding problems sending UTF-8 back to client
Date Thu, 19 Feb 2004 20:19:42 GMT
Please try latest CVS. i think there were some patches
(http://marc.theaimsgroup.com/?l=axis-dev&w=2&r=1&s=24896&q=b)

-- dims

--- Douglas Bitting <Douglas.Bitting@agile.com> wrote:
> I apologize if this has come across already, but I still haven't seen it on the mailing
list
> after 18 hours.
> 
> All,
> 
> I can't really figure out if I'm doing something wrong here or if there is a defect involved.

> Basically, I have a Japanese string that I'm attempting
> to send back to the client.  However, when the client receives the string, it is mangled
beyond
> repair.  I've put together a small test case, and
> include it (and it's results here).
> 
> Here is the method that is invoked via Axis on the server:
> 
>    public String getString() {
>       String str = "SDK
>
\u30e9\u30a4\u30bb\u30f3\u30b9\u304c\u898b\u3064\u304b\u308a\u307e\u305b\u3093\u3067\u3057\u305f\u3002";
>       for (int ii = 0; ii < str.length(); ii++) {
>          System.out.println("char[" + ii + "]: " + ((int) str.charAt(ii)));
>       }
>       return str;
>    }
> 
> The output of this method is as follows:
> 
> char[0]: 83
> char[1]: 68
> char[2]: 75
> char[3]: 32
> char[4]: 12521
> char[5]: 12452
> char[6]: 12475
> char[7]: 12531
> char[8]: 12473
> char[9]: 12364
> char[10]: 35211
> ...
> 
> I generated client side stubs via WSDL2Java, and put together a quick client that simply
does
> this:
> 
>       String str = stub.getString();
>       for (int ii = 0; ii < str.length(); ii++) {
>          System.out.println("char[" + ii + "]: " + ((int) str.charAt(ii)));
>       }
> 
> This emits the following:
> 
> char[0]: 83
> char[1]: 68
> char[2]: 75
> char[3]: 32
> char[4]: 227
> char[5]: 402
> char[6]: 169
> char[7]: 227
> char[8]: 8218
> ...
> 
> The first 4 chars are returned properly, but everything after that is completely munged.
> 
> As near as I can tell, during serialization Axis is manually converting my string into
a UTF-8
> encoded byte array.  However, the inverse operation
> does not appear to happen on the client side.  Am I doing something wrong here, or is
this a
> defect?
> 
> Just for grins, I modified by client code to look like the following:
> 
>       String str = stub.getString();
> 
>       byte[] bytes = str.getBytes();
>       str = new String(bytes, "UTF-8");
> 
>       for (int ii = 0; ii < str.length(); ii++) {
>          System.out.println("char[" + ii + "]: " + ((int) str.charAt(ii)));
>       }
> 
> The additional code attempts to reverse the manual encoding done within Axis; however,
it is not
> entirely successful:
> 
> char[0]: 83
> char[1]: 68
> char[2]: 75
> char[3]: 32
> char[4]: 12521
> char[5]: 12452
> char[6]: 12475
> char[7]: 12531
> char[8]: 12473
> char[9]: 65533
> char[10]: 63
> 
> The first 8 chars are correct, but after that it goes downhill...
> 
> It's worth pointing out that the version of Axis I'm using is a few months old:
> 
> WSDL created by Apache Axis version: 1.2dev
> Built on Aug 26, 2003 (12:11:48 PDT)
> 
> I'm hesitant to update at this point due to project time constraints, but will if I have
to. 
> Has this scenario been addressed in the newer builds?
> 
> Thanks,
> --Doug
> 


=====
Davanum Srinivas - http://webservices.apache.org/~dims/

Mime
View raw message