forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sjur N. Moshagen (JIRA)" <>
Subject [jira] Commented: (FOR-435) Wiki input files (*.jspwiki) is not correctly read when in UTF-8
Date Tue, 30 May 2006 11:38:30 GMT
    [ ] 

Sjur N. Moshagen commented on FOR-435:

Earlier investigation in our project has shown that the Chaperon grammar is using the default
Java file encoding when reading files, and that the default Java encoding is given by the
OS, in our case MacOS X, which has MacRoman as default. Reading UTF-8 encoded files as MacRoman
will of course garble non-ASCII characters.

Today I put some effort into finding a work-around based on this insight, and the result is
the following command line argument:

forrest run -Dforrest.jvmargs="-Dfile.encoding=utf-8"

It doesn't really solve the underlying problem of configuring Chaperon from within Forrest
(or Cocoon), but it does solve our actual problem through a work-around.

> Wiki input files (*.jspwiki) is not correctly read when in UTF-8
> ----------------------------------------------------------------
>          Key: FOR-435
>          URL:
>      Project: Forrest
>         Type: Bug

>   Components: Plugin:
>     Versions: 0.8-dev, 0.7
>  Environment: MacOS X, 10.3.8, Java 1.4.2
>     Reporter: Sjur N. Moshagen

> According to the documentation at:
> it should be possible to configure the Wiki plugin (or any plugin based on Chaperon)
for different encodings of the input file, in my case UTF-8.
> But this does not work. I have:
>       <map:transformer name="lexer" 
>                              src="org.apache.cocoon.transformation.LexicalTransformer"

>                              logger="sitemap.transformer.lexer">
>               <map:parameter name="localizable" value="true"/>
>               <map:parameter name="encoding" value="UTF-8"/>
>             </map:transformer>
> in the input.xmap file in $FORREST_HOME/plugins/wiki, and I have run "ant local-deploy",
but to no avail: multibyte UTF-8 sequences come out as the Latin-1 counterpart of each byte
in the sequence.
> A discussion about this bug can be found at:
> [mail archive not yet updated, will add link here later]

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message