cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject RE: Sudden difference in interpretation of #160 - bug?
Date Mon, 05 Jan 2004 13:42:54 GMT
Right, so far I came up with this:

in the resulting source using 'serialize html' the NO-BREAK-SPACE shows up
as   (i.e. the string, not the character), using 'serialize xhtml' it
shows up as &Acirc (the character) followed by   (the string).

Originally my XSL file added a metatag ...iso-8859-1, but I removed this. No
effect. Close examination of the code:
Internet Explorer 6.0 SP2 adds a metatag ...iso-8859-1 (when I changed the
metatag in the XSL file to UTF-8 IE6 changed it to ISO-8859-1). I haven't
found any setting in IE6 where I could change the encoding.
Opera 7.21 displays the same results, whether with or without the metatag.
In Opera I set the option 'encoding to assume for pages lacking encoding' to
utf-8, but still the same result.

I use both browsers on my home pc too with the same results (I cannot say
whether they are the exact same versions, but it is IE6.X and Opera 7.X)

> -----Original Message-----
> From: Marc Portier [] 
> Sent: Monday, 05 January 2004 12:01
> To:
> Subject: Re: Sudden difference in interpretation of #160 - bug?
> wrote:
> > Hi,
> > 
> > thanks. I've read the article, in fact I read the entire 
> thread, but either
> > New Year's wine is still in my system (I don't drink :-)) 
> or I've stumbled
> > onto a configuration problem/bug in <map:serialize type='xhtml'/>
> > 
> > Point is this: whether I enter &#160; or &#xA0; directly in my XSL
> > stylesheet or through an entity reference 
> > <!ENTITY nbsp "&#160;"> (or the hex code), when I tell the 
> serializer to use
> > type xhtml I get an &Acirc; instead of &nbsp; when I change 
> the type back to
> I take it you refer to &Acirc; just to communicate here, and 
> that it is 
> not actually in the resulting file, right?
> That would be the most interesting thing to see now: what 
> actually is in 
> that file (and not: what is showing up in the browser)

How do I do this? I've tried a cocoon-view, but it shows the resulting page.
And there is no start page. It starts out as an aggregation of xml files
from various sources which are finally processed by an XSL file that
transforms XML into (x)html. I've just tried adding a source:write and the
result is that &nbsp; is represented by A0 characters.

> When the A with ^ shows up it often means the browser 
> received UTF-8 but 
> thinks it is iso-8859-1 any way... as it happens to be (no need to go 
> into the depths of how the encoding works) some of the characters in 
> utf-8 are encoded with more then one byte, in which case the leading 
> byte is not uncommon to map to latin-1's A with something 
> range (just so 
> you recognise the disease)

This is exactly what is happening here: all "special" characters like &nbsp;
, &copy; and &raquo; are prefixed with a &Acirc; Only &nbsp; shows up as
string, the rest as the character.
> could you check the encoding your browser is assuming? and check that 
> with the page or http-headers were saying?

How? As said above: IE6 adds/modifies the metatag to iso-8859-1. Opera
should use utf-8, but still shows the &Acirc;

> > 'html' it works as expected.
> > 
> can be a number of reasons:
> - I think by default html serializer is set to use iso-8859-1 
> as target encoding?

I wouldn't know. I've checked the Cocoon sitemap.xmap (default Cocoon 2.1.3)
and it has no encoding info for the htmlserializer.

> - I think the html serializer (from xalan) would be 
> introducing the &nbsp;

Must be, since the resulting source shows &nbsp; (the string) for all
instances, while other characters such as &copy; and &raquo; show up as the

Bye, Helma

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message