axis-java-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Douglas Bitting" <Douglas.Bitt...@agile.com>
Subject REPOST: Character encoding problems sending UTF-8 back to client
Date Thu, 19 Feb 2004 20:00:28 GMT
I apologize if this has come across already, but I still haven't seen it on the mailing list
after 18 hours.

All,

I can't really figure out if I'm doing something wrong here or if there is a defect involved.
 Basically, I have a Japanese string that I'm attempting
to send back to the client.  However, when the client receives the string, it is mangled beyond
repair.  I've put together a small test case, and
include it (and it's results here).

Here is the method that is invoked via Axis on the server:

   public String getString() {
      String str = "SDK \u30e9\u30a4\u30bb\u30f3\u30b9\u304c\u898b\u3064\u304b\u308a\u307e\u305b\u3093\u3067\u3057\u305f\u3002";
      for (int ii = 0; ii < str.length(); ii++) {
         System.out.println("char[" + ii + "]: " + ((int) str.charAt(ii)));
      }
      return str;
   }

The output of this method is as follows:

char[0]: 83
char[1]: 68
char[2]: 75
char[3]: 32
char[4]: 12521
char[5]: 12452
char[6]: 12475
char[7]: 12531
char[8]: 12473
char[9]: 12364
char[10]: 35211
...

I generated client side stubs via WSDL2Java, and put together a quick client that simply does
this:

      String str = stub.getString();
      for (int ii = 0; ii < str.length(); ii++) {
         System.out.println("char[" + ii + "]: " + ((int) str.charAt(ii)));
      }

This emits the following:

char[0]: 83
char[1]: 68
char[2]: 75
char[3]: 32
char[4]: 227
char[5]: 402
char[6]: 169
char[7]: 227
char[8]: 8218
...

The first 4 chars are returned properly, but everything after that is completely munged.

As near as I can tell, during serialization Axis is manually converting my string into a UTF-8
encoded byte array.  However, the inverse operation
does not appear to happen on the client side.  Am I doing something wrong here, or is this
a defect?

Just for grins, I modified by client code to look like the following:

      String str = stub.getString();

      byte[] bytes = str.getBytes();
      str = new String(bytes, "UTF-8");

      for (int ii = 0; ii < str.length(); ii++) {
         System.out.println("char[" + ii + "]: " + ((int) str.charAt(ii)));
      }

The additional code attempts to reverse the manual encoding done within Axis; however, it
is not entirely successful:

char[0]: 83
char[1]: 68
char[2]: 75
char[3]: 32
char[4]: 12521
char[5]: 12452
char[6]: 12475
char[7]: 12531
char[8]: 12473
char[9]: 65533
char[10]: 63

The first 8 chars are correct, but after that it goes downhill...

It's worth pointing out that the version of Axis I'm using is a few months old:

WSDL created by Apache Axis version: 1.2dev
Built on Aug 26, 2003 (12:11:48 PDT)

I'm hesitant to update at this point due to project time constraints, but will if I have to.
 Has this scenario been addressed in the newer builds?

Thanks,
--Doug


Mime
View raw message