tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <ch...@christopherschultz.net>
Subject Re: Tomcat strips CRLFs from the generated page
Date Tue, 14 Jan 2014 15:02:04 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

André,

On 1/13/14, 9:36 AM, André Warnier wrote:
> Asok Chattopadhyay wrote:
>> Hi,
>> 
>> My servlet generates a page containing embedded JavaScript and
>> sometimes the page received in the browser comes with CRLFs
>> stripped from the text, starting at some point in the text. This
>> creates a big problem if the script contains CRLF as statement
>> separator instead of semi-colon. It's strange that not the entire
>> text is stripped. Say, the first 150 lines comes as it is,
>> whereas starting from line 151, all the CRLFs are stripped. It is
>> fairly consistent for the same page.
>> 
>> I am using Tomcat 6.0.37.
>> 
>> Why does it happen? Is anything in the text triggers this? Is
>> there a way to overcome this problem, as I don't have control
>> over the actual content?
>> 
>> 
>> Thanks in advance.
>> 
>> Here is the example.
>> 
>> 
>> LINE 148:    <script type="text/javascript" 
>> SRC="html/scripts/combotext.js"></script>
>> 
>> LINE 149:    <script type="text/javascript" 
>> SRC="html/scripts/datepicker.js"></script>
>> 
>> LINE 150:    <script type="text/javascript" 
>> SRC="html/scripts/combo.js"></script>
>> 
>> LINE 151:    <script type="text/javascript" 
>> SRC="html/scripts/calc.js"></script>    <script
>> type="text/javascript" 
>> SRC="html/scripts/dream.js"></script><script
>> language="javascript" type="text/javascript">var buttonfunction 
>> clicked(b){button=b.value}function submitit(form){if 
>> (button=="Details"){form.page.value =
>> "opcdt"form.searchbutton.value = "Top"}}function
>> pickProduct(link, cus){if (navigator.appName.indexOf("Netscape")
>> >= 
>> 0){document.one.xinvnum.value=link.textdocument.one.xcus.value=cus.text}else{document.one.xinvnum.value=link.innerTextdocument.one.xcus.value=cus.innerText}return
>>
>>
>> 
false}</script></head><body
>> onload="topBottom();move_caret('one','xcrnnum');" 
>> style="margin:0;padding:0;"><!--<div id="darkBackgroundLayer" 
>> class="darkenBackground"> ...
>> 
> 
> Hi. I have to disregard your example above, because once reformated
> by the various email agents in-between, there is no telling anymore
> what the original was like.
> 
> In general, are you not just victim of the following circumstances
> ?
> 
> In HTML (and I suppose that this also includes embedded 
> <script>..</script> sections, an "end of line" is composed of 2 
> characters : a "carriage return" (CR) and a "line feed" (LF).

HTML actually doesn't care. One could argue that because HTTP uses
CR+LF as line-endings for things like headers that HTML documents
should work the same way, but that's not a part of the standard (at
least not that I could find, and I wasn't going to pay for a copy of
the SGML standard to check, either).

I'm not really sure why the OP is so concerned about line endings.
Tomcat is not stripping them. This code seems particularly dubious:

			temp.readFully(b);
			fi.close();
			s += new String(b);

That last line will convert an array of bytes into a String value.
Newlines should not be stripped, but other kinds of characters
(basically anything not in the current default encoding) will likely
suffer.

> When you are running a java program, the "println" function
> probably appends an "end of line" sequence to what you are
> printing, but this end of line sequence depends on the platform on
> which your java program runs : - on a Windows system, the end of
> line sequence would be CR + LF - on a Unix/Linux system, it will be
> just LF - on a Mac system, it would be just CR

+1

This line of code introduces an artificial newline at the end of the file:

		out.println(s);

That is, it mutates the resource as it is being served. :(


> So depending on what platform your server is running, and on what 
> platform the client is running, they may interpret the result
> differently.
> 
> In other words, if your java application is outputting HTML and/or 
> javascript, you should probably not be using "println".  You should
> be using "print", and explicitly append a CR and LF to your output
> lines. ("\r\n" ?).

... not that it matters very much, because the client should not care.

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJS1VFsAAoJEBzwKT+lPKRYe9YQAK0Y0HdxqLJd3Ob1dvknE66V
T6PvurGvNVDqm6hhHaA9OUD8gXm+Z5PkASlhdBtKjjctlULyt17SzpMAVEpIHiLd
ucPo1oJjVpgFtma3cjxdAABTZB7GnQzXcUG71s7lujEpkkzK2AaGtvDFihiv41dE
oHrUp8BJLBD/hCC0gwsVFyZ9utrU+eoklXest7nI+4pmdPRQiPl5IN3ldOVK/P90
RMqg7u6spOhVGWevHlpfKiY2r1ocxwvT3x7OVHHVtDvYfhqNRELwtCYI3dZ+fSlx
vdQ9n3vmooDfh4Sw6uPqopVGx45fFcTzQXsS4YG2B6ZLY0IFlGaOlLngAS0cURot
acz6O9acsJEqj2/kk9dAoa0fIO08UaUs7wK27Qtgzdj3FTyrjh7/GjgzvSkRmJwa
CtdMLaSZdSTtTJG8JmqnTaX9KfS8bAXDlzxiXrg+6ORQh7XKcxPD982kUTNFx8gP
KI/hyfOLo+aqG9i21Iio+G4D/CEUDywfMyPoQOmEUtFUSCiP0nNqjXbFtYD1TcR1
5IjvQYx59E69FnMEepcPtcQGHcC3PJu7/TpcrVRwq35T9q1Cai+WUDXcpWd2YpY1
2v+H4ulnWC1KPw/xuoOVk7jQMT7gQx/HQsHOOXDOIDIqqJn7kM/Zj9cqwAJPNbp/
PC8B75X2WvfYmXbuAsRb
=ffYk
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message