forrest-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ross Gardler <>
Subject Reusing legacy HTML (was Re: Two questions)
Date Tue, 15 Feb 2005 14:17:47 GMT
Ferdinand Soethe wrote:
> let me know if you'd rather have me post these questions to the list

It is much better to post them to the list, you get the benefit of other
peoples eyes, sometimes this will mean a better solution, sometimes a
faster response, sometimes it'll be me anyway. Furthermore it means the
issues appear in the archives.

Of course, this last point is less important if you are able to write up
this info as a doc.

> If not, here are my questions:

Well here's my answer (I've CC'd to the users list for the above
reasons, I've also reset the reply-to header to the user list).

[also note that now I've had time to think about it this is a little 
different from what I said on SKYPE]

> - I want to process a legacy html file named mybad.html (just one for
>   a start). In order to catch it I created the second pipeline in sitemap.xmap
>   in my project directory.
>   <map:pipeline>
>     <map:match pattern="mybad.html">
>       <map:generate src="cocoon:/**.html"/>
>       <map:transform type="log">
>         <map:parameter name="logfile" value="mydebug.log"/>
>         <map:parameter name="append" value="false"/>
>       </map:transform>
>       <map:serialize type="xml"/>
>     </map:match>
>   </map:pipeline>  
> What I'm unclear about is
> a) What is the correct pattern to use in the cocoon:/-pseudo-protocol.

It's exactly the same as any other URL, so you have "cocoon:" to 
indicate the protocol followed by the path to the resource you refer to.

The number of slashes after the protocol is significant, if you have a
single slash (e.g. "cocoon:/myfile.xml") it means only look in the
current sitemap file. Two slashes (e.g. "cocoon://myfile.xml") means
start looking from the root sitemap.

>    How or where can I find the correct reference if I'm dealing with
>    Forrest defaults like .html.

It's just the name of the file that you want. In this case the correct 
element is:

<map:generate src="cocoon:/mybad.html"/>

But beware if your HTML is not valid XML this will fail. To get around 
this use the JTidy generator, see the forrest.xmap and the cocoon docs 
for examples.

> b) Am I correct that referencing another pipeline with coocon will
>    divert the output of that pipeline to become the generator (or
>    input) of my pipeline.

Yes, although the terminology is not quite right. The generator makes a
request for the indicated resource and pipes the result into the pipeline.

>    If that is correct, how does matching fit into all of this or,
>    putting it differently, which way does the data actually travel:
>    E.g. if my matcher handles requests for "mybad.html", is this what actually
>    happens:
>    1.  The broader selector **.html somewhere deep down in Forrest
>        processes the file mybad.html as it would any other html-file,
>        converts it to xhtml via jtidy.
>    2.  The matcher for mybad.html comes into play and diverts the
>        xml-stream into my pipeline IF the requested file is mybad.html.
>        If not, the first pipeline delivers the xml-stream strait to
>        Forrest default processing.


Only one pipeline will operate, this will be the first one discovered.
The exception to this is when a pipeline is called internally using the
cocoon: protocol. That is, you can execute multiple pipelines that were
triggered by the cocoon: protocol, but only one that us any other protocol.

I think what you are trying to do is this:

      <map:match pattern="mybad.xml">
        <map:generate src="cocoon:/mybad.html" type="html"/>
<!-- removed log stuff, see beow -->
        <map:transform src="..."/>
           <map:transform src="{forrest:stylesheets}/html2document.xsl" />
           <map:transform type="idgen" />
        <map:serialize type="xml"/>

Note your matcher is for XML not for HTML. What happens when the request 
is made is that Forrest will look for a match on html (in forrests 
sitemap.xmap), which makes a request for "cocoon://mybad.xml" which will 
be matched by the above in your project sitemap. You will do whatever 
transformation you need to do to strip old navigation and the like and 
serialise as XML. Forrest then processes this as normal (i.e. skins it).

Note this is based on the HTML processing found in forrest.xmap, I've 
just added a transformation step for you to manipulate your legacy HTML.

> c)  Using the log-transformer component that I found in my book
>       <map:transform type="log">
>         <map:parameter name="logfile" value="mydebug.log"/>
>         <map:parameter name="append" value="false"/>
>       </map:transform>
>     I was trying to log the intermediate result for debugging purposes
>     into a file. Unfortunateley when calling my pipeline I got the
>     error: 'Type 'log' does not exist for 'map:transform' at file:...'

You need to define the component in your sitemap. See <map:components> 
element in Cocoon docs.


View raw message