axis-java-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Jencks (JIRA)" <axis-...@ws.apache.org>
Subject [jira] Created: (AXIS-1971) problem with BOM and character set encoding
Date Mon, 02 May 2005 22:18:05 GMT
problem with BOM and character set encoding
-------------------------------------------

         Key: AXIS-1971
         URL: http://issues.apache.org/jira/browse/AXIS-1971
     Project: Axis
        Type: Bug
  Components: Serialization/Deserialization  
    Versions: current (nightly)    
    Reporter: David Jencks
 Attachments: MessageContext.diff

I'm encountering this problem in the geronimo axis integration, so it's possible that it is
not an axis bug, but I don't see how.

I send a UTF-16 character set encoded message to the server, and get back a message that starts
with a byte order mark but claims to be UTF-8.

I've copied the code from AxisServlet that sets the character encoding on the response to
the equivalent place in geronimo code.

After tracing through what is happening, I find that during the return from the invoke call,
leaving the HandlerChainImpl (postInvoke line 206) the entire response is serialized with
the default UTF-8 character set encoding into a ByteArray.

After invoke returns, the code from AxisServlet changes the character set encoding to UTF-16
and writes out the message.  However, since the message was already serialized into a ByteArray
buffer, this apparently has the effect of writing out the byte order mark and then the byte
array that was produced using UTF-8.

This can be fixed by making the message context set the response message character set encoding
when the response message is set on the message context (see attached patch).

I find the logic that determines the response character set encoding byzantine and would prefer
to simplify it to the extent that I can understand how it works... I would need answers to
these questions in order to proceed:

1. Under what circumstances would a response message be in a different character set encoding
than a request?

2. Which user/application code should be able to set the response character set encoding and
how should it do so?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message