Return-Path: Delivered-To: apmail-cocoon-dev-archive@www.apache.org Received: (qmail 39959 invoked from network); 22 Jan 2009 16:15:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 22 Jan 2009 16:15:14 -0000 Received: (qmail 95646 invoked by uid 500); 22 Jan 2009 16:15:13 -0000 Delivered-To: apmail-cocoon-dev-archive@cocoon.apache.org Received: (qmail 95560 invoked by uid 500); 22 Jan 2009 16:15:13 -0000 Mailing-List: contact dev-help@cocoon.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: Reply-To: dev@cocoon.apache.org List-Id: Delivered-To: mailing list dev@cocoon.apache.org Received: (qmail 95551 invoked by uid 99); 22 Jan 2009 16:15:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Jan 2009 08:15:13 -0800 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=GAPPY_SUBJECT,SPF_HELO_PASS,SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of agallardo@agssa.net designates 190.184.22.61 as permitted sender) Received: from [190.184.22.61] (HELO agssa.net) (190.184.22.61) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Jan 2009 16:15:04 +0000 Received-SPF: pass (agssa.net: 10.0.0.13 is whitelisted) receiver=agssa.net; client-ip=10.0.0.13; helo=[10.0.0.13]; envelope-from=agallardo@agssa.net; x-software=spfmilter 0.97 http://www.acme.com/software/spfmilter/ with libspf2-1.0.0; Received: from [10.0.0.13] (dev07lan.agssa.net [10.0.0.13]) by agssa.net (8.14.2/8.14.2) with ESMTP id n0MGEblo006195 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 22 Jan 2009 10:14:39 -0600 Message-ID: <49789B6D.3090208@agssa.net> Date: Thu, 22 Jan 2009 10:14:37 -0600 From: Antonio Gallardo Organization: AG Software, S. A. User-Agent: Thunderbird 2.0.0.19 (X11/20090105) MIME-Version: 1.0 To: dev@cocoon.apache.org Subject: Re: Entity escaping in o.a.c.c.serializers.XHTMLSerializer References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.0 (agssa.net [10.0.0.1]); Thu, 22 Jan 2009 10:14:39 -0600 (CST) X-Virus-Scanned: ClamAV 0.93.3/8891/Thu Jan 22 06:08:26 2009 on ags01.agssa.net X-Virus-Status: Clean X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on ags01.agssa.net X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, score=0.2 required=5.0 tests=ALL_TRUSTED,AWL,GAPPY_SUBJECT, SPF_PASS,WHOIS_MYPRIVREG autolearn=no version=3.2.5 Hi Andreas, We hit the same issue some years ago and we found a more pragmatic solution: In org.apache.cocoon.components.serializers.encoding.XHTMLEncoder add the line marked with a + sign: private static final char ENCODINGS[][][] = { + { { 39 } , "'".toCharArray() }, { { 160 } , " ".toCharArray() }, See: http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Entities_representing_special_characters_in_XHTML Please let me know if this fix the issue, I will gladly commit the fix. Best Regards, Antonio Gallardo. Andreas Hartmann escribi�: > Hi Cocoon devs, > > this issue has already been discussed several times, e.g. [1], but > AFAIK has not been resolved yet. > > The XHTMLSerializer, or, more specifically, the XHMLEncoder, from the > serializers block in Cocoon 2.1.x escapes all characters with a > corresponding HTML 4.0 character entity reference into this entity > reference. This causes issues with inline JavaScript, since e.g. the > double quotes are transformed to " which causes a JavaScript > parsing error. Another minor negative effect is the increased document > size. > > If I understand the W3C correctly, see e.g. [2], the recommended > approach is to use the character set of the encoding as far as possible, > and use escapes only in exceptional circumstances. I didn't find a > reason why the XHTMLSerializer uses escapes, but I suspect that it is > related to browser compatibility issues. > > Do you think it would make sense to make this behaviour configurable, > e.g. > > true|false > > Does the XHTMLSerializer in Cocoon 2.2 show a different behaviour? > > TIA for any comments! > > -- Andreas > > > [1] > http://www.nabble.com/Problem-with-XHTMLSerializers-to1311360.html#a1311360 > > [2] http://www.w3.org/International/tutorials/tutorial-char-enc/ > >