cxf-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Svensson (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CXF-4533) Encoding error in CachedOutputStream when double-byte char is on 1024 byte boundary
Date Tue, 02 Oct 2012 08:51:07 GMT
Lars Svensson created CXF-4533:
----------------------------------

             Summary: Encoding error in CachedOutputStream when double-byte char is on 1024
byte boundary
                 Key: CXF-4533
                 URL: https://issues.apache.org/jira/browse/CXF-4533
             Project: CXF
          Issue Type: Bug
    Affects Versions: 2.6.2, 2.3
            Reporter: Lars Svensson


Hi,

We experience occasional encoding errors where a small number of two-byte chars get encoded
wrong in an otherwise correct encoded message. I have traced the problem to the writeCacheTo
method of CachedOutputStream where the temp cached file is read as 1024 bytes at the time
which are then converted to a String before getting appended to the StringBuilder. If the
1024 byte boundary falls right between the two bytes of a two byte char the encoding fails.


public void writeCacheTo(StringBuilder out, String charsetName) throws IOException {
   flush();
   if (inmem) {
      if (currentStream instanceof ByteArrayOutputStream) {
         byte[] bytes = ((ByteArrayOutputStream)currentStream).toByteArray();
         out.append(IOUtils.newStringFromBytes(bytes, charsetName));
      } else {
         throw new IOException("Unknown format of currentStream");
      }
   } else {
      // read the file
      FileInputStream fin = new FileInputStream(tempFile);
      byte bytes[] = new byte[1024];
      int x = fin.read(bytes);
      while (x != -1) {
         out.append(IOUtils.newStringFromBytes(bytes, charsetName, 0, x));
         x = fin.read(bytes);
      }
      fin.close();
   }
}


Below is a couple of lines from the hex-dump of the cache-file where you can see that the
second o-slash in the file fall on a 1024 byte boundary and therefore gets corrupted in the
outgoing message:

0001fbe0:  66 66 65 6e 74 6c 69 67 20 66 c3 b8 72 74 69 64 73 70 65 6e 73 69 6f 6e 2c 20 73
6f 6d 20 66 c3  ffentlig førtidspension, som f?
0001fc00:  b8 72 65 72 20 74 69 6c 2c 20 61 74 20 6d 65 64 6c 65 6d 3c 2f 70 67 66 3a 52 65
70 75 72 63 68  ?rer til, at medlem</pgf:Repurch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message