cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 25694] New: - [PATCH] JSPEngineImplNamedDispatcherInclude incorrectly converts bytes to characters
Date Mon, 22 Dec 2003 11:03:00 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25694>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25694

[PATCH] JSPEngineImplNamedDispatcherInclude incorrectly converts bytes to characters

           Summary: [PATCH] JSPEngineImplNamedDispatcherInclude incorrectly
                    converts bytes to characters
           Product: Cocoon 2
           Version: Current CVS 2.1
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: Normal
          Priority: Other
         Component: sitemap components
        AssignedTo: dev@cocoon.apache.org
        ReportedBy: johan@hippo.nl


The MyServletOutputStream class in the JSPEngineImplNamedDispatcherInclude class 
is an output stream. All output streams have to implement the write(byte) 
method. This is documented in the implementation, but it also states that the 
method is not used. This is not true. The write(byte) method does get invoked 
when MyServletOutputStream is used as an output stream instead of using 
MyServletOutputStream's PrintWriter.

Its current implementation writes the byte as a character to the PrintWriter. 
When the byte is in the range 0..127 this poses no problem; the UTF-8 encoding 
is the same byte. When the byte is in the range -128..-1 the data gets corrupted 
because of the conversion to an int and subsequently to a char: the negative 
value is first sign-extended to an int. Then the lower 16-bits are used as the 
char value, which results in a char value between 65408..65535. The PrintWriter 
(using the UTF-8 encoding) will output these characters using multiple bytes.

If MyServletOutputStream is used to stream bytes in UTF-8 encoding and this data 
contains the representation for an 'e' with an umlaut (2 bytes in UTF-8 
encoding), the final data contains six bytes.

Instead of writing the byte to the PrintWriter, during which it gets converted 
to a char, MyServletOutputStream should write to the underlying 
ByteArrayOutputStream. Before doing so, it should syncrhonize the PrintWriter 
and the ByteArrayOutputStream.

Mime
View raw message