cxf-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cyrille Chépélov (JIRA) <j...@apache.org>
Subject [jira] [Commented] (CXF-7491) TransformInInterceptor / TransformOutInterceptor assume UTF-8
Date Thu, 31 Aug 2017 10:57:00 GMT

    [ https://issues.apache.org/jira/browse/CXF-7491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16148827#comment-16148827
] 

Cyrille Chépélov commented on CXF-7491:
---------------------------------------

A way to correct this is to change (looking a the stack top down):
* in TransformUtils.java line 41—45, deprecate createNewReaderIfNeeded / createNewWriterIfNeeded
; add an overload with an "encoding" argument ; have it call the overload of StaxUtils.createXMLStreamReader
(resp. StaxUtils.createXMLStreamWriter) with the encoding argument
* propagate the deprecation + additional 'encoding' argument overload in TransformUtils.java
line 49-110 for methods createTransform(Writer|Reader)IfNeeded 
* add an argument to the protected method "createTransformReaderIfNeeded" in TransformInInterceptor.java,
and in the handleMessage method, extract the desired encoding from the Message structure (from
the org.apache.cxf.message.Message.ENCODING property), defaulting to UTF-8 if the property
is missing.
* symmetric changes in TransformOutInterceptor.java

Question: in the TransformInInterceptor class, the createTransformReaderIfNeeded method lacks
a way to convey the desired encoding, and is protected, making it an extension point. Simply
deprecating the method + adding a charset-aware overload would be dangerous for subclasses
of TransformInInterceptor, as these subclasses would suddenly no longer be using the subclassed
behaviour, but would revert to TransformInInterceptor's operation.

Based on the principle of minimum surprise, I see two ways:
# intentionally break compatibility by adding the "encoding" argument to createTransformReaderIfNeeded
in TransformInInterceptor (+symmetric in TransformOutInterceptor), forcing any subclass to
be updated before being usable again. This also breaks binary compatibility for the sake of
fixing a fairly local issue
# avoid breaking compatibility by finding a secondary channel aside from method parameters
to convey the desired encoding from the handleMessage to the createTransformReader method,
which would automatically be ugly but would preserve binary compatibility (while it wouldn't
necessarily 'magically' update the behaviour of TransformInInterceptor subclasses to follow
the desired encoding, it would avoid surprises). 
# deprecate TransformInInterceptor / TransformOutInterceptor (keeping the interface as is),
implement the changes proposed in point 1 into copies (TransformInCharsetAwareInterceptor
/ TransformOutCharsetAwareInterceptor), update StaxTransformFeature to use the new interceptor
implementations.

While solution #1 is simpler, #3 seems to avoid surprise behaviour changes and breaking binary
compatibility while avoiding ugly and performance-consuming hacks. 

Proceeding to implement #3 unless advised otherwise.

(I, of course, have no way to ask and get the remote IBMi system to speak UTF-8 in any sort
of reasonable time frame)


> TransformInInterceptor / TransformOutInterceptor assume UTF-8
> -------------------------------------------------------------
>
>                 Key: CXF-7491
>                 URL: https://issues.apache.org/jira/browse/CXF-7491
>             Project: CXF
>          Issue Type: Bug
>          Components: Soap Binding
>    Affects Versions: 3.1.11, 3.1.12
>         Environment: client Linux/Java/CXF 
> server IBMi AS/400
>            Reporter: Cyrille Chépélov
>
> When talking to a server using IBMi / RPG-based software and SOAP gateway:
> the returned SOAP message contains XML encoded as ISO-8859-1; the HTTP header do specify
a content type of xml+soap with character set ISO-8859-1; however the XML message itself include
no character set declaration.
> Due to discrepancies between the official WSDL for the SOAP message and the remote implementation,
a couple transforms had to be deployed. This works fine as long as the exchanged messages
actually conform to US-ASCII (no diacritics), but whenever any character encoded differently
between ISO-8859-1 and UTF-8 is used, the TransformInInterceptor fails to parse the text,
as the XMLStreamReader is built to expect UTF-8 and actually receives ISO-8859-1 input



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message