cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anna Afonchenko" <>
Subject Re: Comments escaped in serializer
Date Tue, 29 Apr 2003 06:51:31 GMT
Hi J.Pietschmann.
Thank you very much for your answer.
The problem is that the html pages that I want to transform
to xml/xhtml are not mine, I download them from different sites,
and thus have no control over how the scripts are written in those pages.
So none of your suggestions is not possible for me :-(
Isn't there a workaround that will force XMLSerializer not to escape < and >
within scripts?
I don't think that I am the first one encountering a problem like this.
It is quite a common thing wanting to tranform some arbitrary html page to
a valid xhtml, keeping all teh functionality.
What can I do in the case like this?

Thank you very much for your help.


----- Original Message -----
From: "J.Pietschmann" <>
To: <>
Sent: Monday, April 28, 2003 5:56 PM
Subject: Re: Comments escaped in serializer

Anna Afonchenko wrote:
> I am trying to construct a pipeline that will get in some online page and
present it.
> The page got some scripts like:
> <script type="text/javascript">
>     <!--
>         javascript code
>     //-->
> </script>
> Now here is my problem. If I serialize the page as html, everything is
> But when I try to serialize the same page as xml or xhtml, the
> scripts stop working. I think that I know the reason for this.
> When serializing the page as xhtml, the comment < and > signs inside
> are turned into entities &lt; and &gt;. When the page serialized as html
the comments stay valid.

HTML serialization sspecifies that < and > are not escaped within
<script> and a few other elements. In XML (or XHTML, which is XML
too), escaping is *required*.
Possible workarounds, neither is perfect:
1. don't use the SGML commen within <script> more than 90% of the
   users use browsers which don't need it anymore anyway.
2. use
  <script type="text/javascript">
          javascript code
   This has drawbacks: you must not use the -- operator in your
   javascript, and the processor may scramble line breaks.
3. Instead of embedding th JS in your page, use <link> to refer
   to an external JS document.
There is *no* way to meet all your possible requirements.


To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message