tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Wall <d.w...@computer.org>
Subject Tomcat 7.0.19 character encoding issue with JSP
Date Thu, 01 Sep 2011 02:41:20 GMT
I'm trying to track down a character encoding issue that I've been 
having, but don't really understand. Hopefully one of you will know what 
the answer is.

I am using CKEditor to generate some user-specified HTML. CKEditor 
offers an "insert special character" function that often creates named 
HTML entities like "&yen;" but they also have a few like the solid black 
right arrow that is a UTF8 character rather than an entity spec. I then 
generate a JSP file that includes that HTML produced by CKEditor.

Initially, because I was using the Java 6 FileWriter without specifying 
a character encoding and I'd end up with a generated JSP where the HTML 
entities were fine, but the other special characters appeared as just 
'?' in the file. I changed to use FileOutputStream/OutputStreamWriter 
and specified "UTF-8" and the JSP looked good:

<%@ page contentType="text/html; charset=utf-8" session="true" 
isELIgnored="true" %>
...
<p>These have issues: ► Ŵ but these don&#39;t: &trade; &hArr; &diams;

&aacute; &para; &yen;</p>

With the UTF8 encoding on writing the JSP, the right arrow and latin-W 
appeared in the JSP file instead of two question marks. I thought maybe 
I had won, but when I look at the .java class file that is generated by 
Tomcat, I see this instead:

out.write("<p>These have issues: â–º Å´ but these don&#39;t: &trade; 
&hArr; &diams; &aacute; &para; &yen;</p>\n");

And when I view that in a web browser, I'm back to question marks again. 
View source in the browser shows:

<p>These have issues: ? ? but these don&#39;t:&trade;  &hArr;  &diams;
 &aacute;  &para;  &yen;</p>

So I figured it was the default character encoding of the JVM causing me 
some grief. I checked and the default on my Windows PC is Cp1252. But 
when I change this with the JVM argument -Dfile.encoding=UTF8, I am no 
better off. The JSP looks okay, but the .java generated looks like 
above. I did note that I could revert back to writing the JSP using 
FileWriter and it produced the correct JSP file, but the 
Tomcat-generated .java file still was wrong.

What might I need to do to ensure that the .java file created from my 
JSP can both read my JSP correctly encoded and write the .java file 
correctly encoded so that these special character appear nice. It's not 
really Tomcat that is the issue since CKEditor is running in Vaadin 
which is running in Tomcat and it looks fine there, but as soon as I run 
the generated JSP, the characters get lost and I end up with question 
marks instead.

Thanks for any ideas,
David

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message