tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Asok Chattopadhyay <da.a...@gmail.com>
Subject Re: Tomcat strips CRLFs from the generated page
Date Tue, 14 Jan 2014 20:18:44 GMT
It looks like, the problem may be caused due to some scripts being inserted
into the page by an external domain. I am investigating farther on that
line.
Thanks everybody.



On Tue, Jan 14, 2014 at 11:02 PM, Christopher Schultz <
chris@christopherschultz.net> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> André,
>
> On 1/13/14, 9:36 AM, André Warnier wrote:
> > Asok Chattopadhyay wrote:
> >> Hi,
> >>
> >> My servlet generates a page containing embedded JavaScript and
> >> sometimes the page received in the browser comes with CRLFs
> >> stripped from the text, starting at some point in the text. This
> >> creates a big problem if the script contains CRLF as statement
> >> separator instead of semi-colon. It's strange that not the entire
> >> text is stripped. Say, the first 150 lines comes as it is,
> >> whereas starting from line 151, all the CRLFs are stripped. It is
> >> fairly consistent for the same page.
> >>
> >> I am using Tomcat 6.0.37.
> >>
> >> Why does it happen? Is anything in the text triggers this? Is
> >> there a way to overcome this problem, as I don't have control
> >> over the actual content?
> >>
> >>
> >> Thanks in advance.
> >>
> >> Here is the example.
> >>
> >>
> >> LINE 148:    <script type="text/javascript"
> >> SRC="html/scripts/combotext.js"></script>
> >>
> >> LINE 149:    <script type="text/javascript"
> >> SRC="html/scripts/datepicker.js"></script>
> >>
> >> LINE 150:    <script type="text/javascript"
> >> SRC="html/scripts/combo.js"></script>
> >>
> >> LINE 151:    <script type="text/javascript"
> >> SRC="html/scripts/calc.js"></script>    <script
> >> type="text/javascript"
> >> SRC="html/scripts/dream.js"></script><script
> >> language="javascript" type="text/javascript">var buttonfunction
> >> clicked(b){button=b.value}function submitit(form){if
> >> (button=="Details"){form.page.value =
> >> "opcdt"form.searchbutton.value = "Top"}}function
> >> pickProduct(link, cus){if (navigator.appName.indexOf("Netscape")
> >> >=
> >>
> 0){document.one.xinvnum.value=link.textdocument.one.xcus.value=cus.text}else{document.one.xinvnum.value=link.innerTextdocument.one.xcus.value=cus.innerText}return
> >>
> >>
> >>
> false}</script></head><body
> >> onload="topBottom();move_caret('one','xcrnnum');"
> >> style="margin:0;padding:0;"><!--<div id="darkBackgroundLayer"
> >> class="darkenBackground"> ...
> >>
> >
> > Hi. I have to disregard your example above, because once reformated
> > by the various email agents in-between, there is no telling anymore
> > what the original was like.
> >
> > In general, are you not just victim of the following circumstances
> > ?
> >
> > In HTML (and I suppose that this also includes embedded
> > <script>..</script> sections, an "end of line" is composed of 2
> > characters : a "carriage return" (CR) and a "line feed" (LF).
>
> HTML actually doesn't care. One could argue that because HTTP uses
> CR+LF as line-endings for things like headers that HTML documents
> should work the same way, but that's not a part of the standard (at
> least not that I could find, and I wasn't going to pay for a copy of
> the SGML standard to check, either).
>
> I'm not really sure why the OP is so concerned about line endings.
> Tomcat is not stripping them. This code seems particularly dubious:
>
>                         temp.readFully(b);
>                         fi.close();
>                         s += new String(b);
>
> That last line will convert an array of bytes into a String value.
> Newlines should not be stripped, but other kinds of characters
> (basically anything not in the current default encoding) will likely
> suffer.
>
> > When you are running a java program, the "println" function
> > probably appends an "end of line" sequence to what you are
> > printing, but this end of line sequence depends on the platform on
> > which your java program runs : - on a Windows system, the end of
> > line sequence would be CR + LF - on a Unix/Linux system, it will be
> > just LF - on a Mac system, it would be just CR
>
> +1
>
> This line of code introduces an artificial newline at the end of the file:
>
>                 out.println(s);
>
> That is, it mutates the resource as it is being served. :(
>
>
> > So depending on what platform your server is running, and on what
> > platform the client is running, they may interpret the result
> > differently.
> >
> > In other words, if your java application is outputting HTML and/or
> > javascript, you should probably not be using "println".  You should
> > be using "print", and explicitly append a CR and LF to your output
> > lines. ("\r\n" ?).
>
> ... not that it matters very much, because the client should not care.
>
> - -chris
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
> Comment: GPGTools - http://gpgtools.org
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQIcBAEBCAAGBQJS1VFsAAoJEBzwKT+lPKRYe9YQAK0Y0HdxqLJd3Ob1dvknE66V
> T6PvurGvNVDqm6hhHaA9OUD8gXm+Z5PkASlhdBtKjjctlULyt17SzpMAVEpIHiLd
> ucPo1oJjVpgFtma3cjxdAABTZB7GnQzXcUG71s7lujEpkkzK2AaGtvDFihiv41dE
> oHrUp8BJLBD/hCC0gwsVFyZ9utrU+eoklXest7nI+4pmdPRQiPl5IN3ldOVK/P90
> RMqg7u6spOhVGWevHlpfKiY2r1ocxwvT3x7OVHHVtDvYfhqNRELwtCYI3dZ+fSlx
> vdQ9n3vmooDfh4Sw6uPqopVGx45fFcTzQXsS4YG2B6ZLY0IFlGaOlLngAS0cURot
> acz6O9acsJEqj2/kk9dAoa0fIO08UaUs7wK27Qtgzdj3FTyrjh7/GjgzvSkRmJwa
> CtdMLaSZdSTtTJG8JmqnTaX9KfS8bAXDlzxiXrg+6ORQh7XKcxPD982kUTNFx8gP
> KI/hyfOLo+aqG9i21Iio+G4D/CEUDywfMyPoQOmEUtFUSCiP0nNqjXbFtYD1TcR1
> 5IjvQYx59E69FnMEepcPtcQGHcC3PJu7/TpcrVRwq35T9q1Cai+WUDXcpWd2YpY1
> 2v+H4ulnWC1KPw/xuoOVk7jQMT7gQx/HQsHOOXDOIDIqqJn7kM/Zj9cqwAJPNbp/
> PC8B75X2WvfYmXbuAsRb
> =ffYk
> -----END PGP SIGNATURE-----
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message