forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thorsten Scherler <thorsten.scher...@wyona.com>
Subject Re: [jira] Commented: (FOR-435) Wiki input files (*.jspwiki) is not correctly read when in UTF-8
Date Wed, 31 May 2006 07:33:35 GMT
El mié, 31-05-2006 a las 09:45 +0300, Sjur Moshagen escribió:
> Den 31. mai. 2006 kl. 04.10 skrev David Crossley:
> 
> > Sjur N. Moshagen (JIRA) wrote:
> >>
> >> Sjur N. Moshagen commented on FOR-435:
> >> --------------------------------------
> >>
> >> Earlier investigation in our project has shown that the Chaperon  
> >> grammar is using the default Java file encoding when reading  
> >> files, and that the default Java encoding is given by the OS, in  
> >> our case MacOS X, which has MacRoman as default. Reading UTF-8  
> >> encoded files as MacRoman will of course garble non-ASCII characters.
> >
> > Perhaps this issue also affects the core of Forrest.
> > IIUC from our sitemaps, we use Chaperon to extract
> > links from CSS files.
> 
> It sounds at least like a potential source of problems.
> 

I am not 100% sure but this observation could explain as well FOR-492.

> >> Today I put some effort into finding a work-around based on this  
> >> insight, and the result is the following command line argument:
> >>
> >> forrest run -Dforrest.jvmargs="-Dfile.encoding=utf-8"
> >
> > Perhaps this should be an available forrest property.
> 
> That would be very nice, although it should be made clear in the  
> documentation that it can affect more than Chaperon. The parameter  
> overrides the OS-provided default file encoding, and sets the  
> specified file encoding as default for the Java VM. Thus, all file  
> readers not specifying the encoding will use it.

We should make a test to generate our site on windows and linux with
-Dforrest.jvmargs="-Dfile.encoding=utf-8" set. Maybe for the time being
we can add this to our forrest.properties of the site-author and see
whether we can omit FOR-492.

> 
> >> It doesn't really solve the underlying problem of configuring  
> >> Chaperon from within Forrest (or Cocoon), but it does solve our  
> >> actual problem through a work-around.
> >
> > Does the Cocoon chaperon block need some configurability
> > added?
> 
> AFAIR (it is a long time since I tried this), the Chaperon  
> documentation claims the file reading encoding to be configurable,  
> but I could not get it to work. Whether that was my mistake or a bug  
> in Chaperon is beyond me:-)
> 
> > Also does our Chaperon jar need updating?
> >
> > You mentioned an important mail thread below, but could
> > not provide the link at the time.
> 
> The link is provided in the first comment in the issue, just below  
> the "empty link" text.
> 
> > Thanks very much for you investigation and other effort.
> 
> Thank you (and all the others) for your work with Forrest!
> 
> > -David
> 
> Sjur

Thanks very much, this findings may be the solution of FOR-492. :)

salu2

> 
> >>> Wiki input files (*.jspwiki) is not correctly read when in UTF-8
> >>> ----------------------------------------------------------------
> >>>
> >>>          Key: FOR-435
> >>>          URL: http://issues.apache.org/jira/browse/FOR-435
> >>>      Project: Forrest
> >>>         Type: Bug
> >>
> >>>   Components: Plugin: input.wiki
> >>>     Versions: 0.8-dev, 0.7
> >>>  Environment: MacOS X, 10.3.8, Java 1.4.2
> >>>     Reporter: Sjur N. Moshagen
> >>
> >>>
> >>> According to the documentation at:
> >>> http://chaperon.sourceforge.net/using-cocoon.html
> >>> it should be possible to configure the Wiki plugin (or any plugin  
> >>> based on Chaperon) for different encodings of the input file, in  
> >>> my case UTF-8.
> >>> But this does not work. I have:
> >>>       <map:transformer name="lexer"
> >>>                               
> >>> src="org.apache.cocoon.transformation.LexicalTransformer"
> >>>                              logger="sitemap.transformer.lexer">
> >>>               <map:parameter name="localizable" value="true"/>
> >>>               <map:parameter name="encoding" value="UTF-8"/>
> >>>             </map:transformer>
> >>> in the input.xmap file in $FORREST_HOME/plugins/wiki, and I have  
> >>> run "ant local-deploy", but to no avail: multibyte UTF-8  
> >>> sequences come out as the Latin-1 counterpart of each byte in the  
> >>> sequence.
> >>> A discussion about this bug can be found at:
> >>> [mail archive not yet updated, will add link here later]

-- 
Thorsten Scherler
COO Spain
Wyona Inc.  -  Open Source Content Management  -  Apache Lenya
http://www.wyona.com                   http://lenya.apache.org
thorsten.scherler@wyona.com                thorsten@apache.org


Mime
View raw message