tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Beaubien <>
Subject Re: BugRat Report #323 has been filed.
Date Wed, 01 Nov 2000 17:15:26 GMT
Please note that the problem I am reporting does not pertain to UTF-8
encoding in URLs, but to UTF-8 encoded values above the ascii range in the
CONTENT of the HTML pages prepared by my servlet.  The content of my pages
includes UTF-8 encode unicode values in the CJK range.   For example, see
the following test page delivered by my servlet under the early, correctly
functioning version of Tomcat 3.1 (I recommend using IE to view this page;
if you have Chinese font support installed you will be able to see Chinese
characters in the lefthand frames.  But even if the Chinese characters show
up as boxes for lack of the proper font, you can tell by looking at the
page source that the 3-byte UTF-8 encodings for the Chinese characters are
being delivered to the browser):

Under the newest release of Tomcat 3.1, the Chinese characters in question
all get translated to question marks by Tomcat before being delivered to
the browser.

The patches that I see cited at the address provided by Kim below all seem
to pertain to UTF-8 encoding in URLs, not in page content.  Stefan van den
Oord and I are reporting problems with the handling of UTF-8 encoding in
the <body> of the HTML pages prepared by our servlets.


Rick Beaubien

At 12:31 PM 11/01/2000 +0900, you wrote:
>Please check
>On Tue, 31 Oct 2000, BugRat Mail System wrote:
>> Bug report #323 has just been filed.
>> You can view the report at the following URL:
>>    <>
>> REPORT #323 Details.
>> Project: Tomcat
>> Category: Bug Report
>> SubCategory: New Bug Report
>> Class: swbug
>> State: received
>> Priority: high
>> Severity: critical
>> Confidence: public
>> Environment: 
>>    Release: 3.1
>>    JVM Release: jdk1.2.2
>>    Operating System: Solaris
>>    OS Release: 2.6
>>    Platform: Unix
>> Synopsis: 
>> Tomcat 3.1 mishandles UTF-8 encoded text above the ascii range
>> Description:
>> My servlet produces UTF-8 encoded HTML pages that 
>> include encoded Unicode character values in the CJK range; 
>> I set the HttpServletResponse.ContentType in these cases to
>> "text/html;charset=utf-8".  Under an early issue of 
>> Tomcat 3.1, the UTF-8 encoded CJK characters got submitted 
>> to the browser properly; IE5 with the proper fonts 
>> installed was able to display theses characters just fine.  
>> Under the newest issue of Tomcat 3.1, however, the CJK 
>> characters in the HTML pages are replaced with "?"s before 
>> they are submitted to the browser! 
>> For another report of the same problem, see 
>> Stefan van den Oord's memo to the developers list on
>> May 4, 2000:
>> )
>To unsubscribe, e-mail:
>For additional commands, e-mail:

Rick Beaubien 

Software Engineer: Research and Development
Library Systems Office
Rm 386 Doe Library
University of California
Berkeley, CA 94720-6000

View raw message