forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sjur Moshagen <sju...@mac.com>
Subject Re: Latin1 character problems in dispatcher
Date Fri, 21 May 2010 10:05:45 GMT
Den 21. mai. 2010 kl. 12.03 skrev Thorsten Scherler:

>> The text returned by that Uri is:
>> 
>> <?xml version="1.0" encoding="ISO-8859-1"?><div id="content"><h1>Divvun
- Sámi proofing tools project</h1><div id="content-main">
>> 
>> 	  <div class="note"><div class="label">UTF-8 character test</div><div
class="content">
>> 		There seems to be problems with certain characters, but only in
>> 		Dispatcher:<br xmlns:xi="http://www.w3.org/2001/XInclude"/>
>> 		a á c &#269; d &#273; n &#331; s &#353; t &#359; z &#382;
ae æ oe ø ao å a¨ ä o¨ ö g &#485; h &#295; u &#649; i &#616;
>> 	  </div></div>
>> 
>> </div></div>
>> 
>> Two things to note here:
>> 
>> The encoding is specified as ISO-8859-1, which is wrong,
> 
> yes should be utf8.
>> 

...

>> I don't know where the encoding comes from - everything on my end is marked as UTF-8.
I grepped for the string "ISO-8859-1" in the Forrest sources, and got many hits, but nothing
that seemed to relate to Dispatcher.
> 
> The *.body.xml comes from the dataModel.xmap:
> 
> <!-- HTML rendered from intermediate format -->
>      <map:match pattern="**.body.xml">
>        <map:generate src="cocoon:/{1}.source.rewritten.xml" />
>        <map:transform src="{lm:dataModel-html-document-to-html.xsl}">
>          <map:parameter name="path" value="{1}.html" />
>        </map:transform>
>        <map:serialize />
>      </map:match>
> 
> The serializer here is the default one.
> 
> we define it in the xmap as
> 
> <map:serializers default="xml" />
> 
> That should read:
> <map:serializers default="xml-utf8" />
> 
> I added to revision 946939 please see whether that fixes the issue. I added a test note
to org.apache.forrest.plugin.internal.dispatcher/src/documentation/content/xdocs/index.xml
so you can directly run "forrest run"  in the plugin and see the outcome.

I did it using my own site (the same document as earlier) - and your change FIXED the bug:)

All instances of garbled utf-8 characters are now fixed, both in the body text, and elsewhere.

Thanks a lot!

Best,
Sjur


Mime
View raw message