httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Kew <n...@webthing.com>
Subject Re: suggestion: strip comments from served html pages
Date Mon, 08 Nov 2004 05:02:30 GMT
On Sun, 7 Nov 2004, [ISO-8859-15] André Malo wrote:

> * Nick Kew <nick@webthing.com> wrote:
>
> > BTW, the "what is a comment" problem is easier than it looks, as both
> > <script> and <style> are declared in HTML as having CDATA content.
> > That makes it trivial to distinguish them from "inert" comments.
>
> but in xhtml it's PCDATA, which makes them real xml comments...

So long as we're in web-browser-compatible land, we can parse XHTML with
an HTML parser that knows about CDATA.  And when we move out of it,
we're also leaving commented <script> and <style> contents behind.

I'll grant there are other pathological edge-cases due to the ways
people abuse markup.  That's one very good reason none of the modules
I mentioned defaults to stripping comments:-)

-- 
Nick Kew

Mime
View raw message