forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Crossley <cross...@apache.org>
Subject Re: howto-custom-html-source
Date Tue, 03 Jan 2006 08:47:54 GMT
Paul Bolger wrote:
> Thanks David. My apologies for the break in transmission: Xmas etc...
> I've had a go - a few goes actually - at getting this to work, and I'm
> still not getting anywhere.
> I've inserted the following into my sitemap.xmap file:
> 
>  <map:match pattern="**dirtyhtml**.xml">
>  <map:generate src="{project:content.xdocs}{1}/dirtyhtml/{2}.html" />
>  <map:transform src="{project:resources.stylesheets}/puck.xsl" />
>  <map:serialize type="xml"/>
> </map:match>

But that is not what we discussed below.

Here is a quick trip down development lane :-)

mkdir /tmp/my-site; cd /tmp/my-site
forrest seed-sample

cp someDirty.html src/documentation/content/xdocs/samples/dirtyhtml/index.html
 (e.g. get the example attachment from FOR-775 [1])

forrest run

browser http://localhost:8888/samples/dirtyhtml/index.html
Forrest will try to render the html, but you want to
extract some special content so make your own sitemap.

Lets build it bit-by-bit to make sure that we have it correct
at each step of the way. Add the following to
src/documentation/sitemap.xmap

<map:match pattern="**/dirtyhtml/**.html">
 <map:generate src="{project:content.xdocs}{1}/dirtyhtml/{2}.html" />
 <map:serialize type="xml"/>
</map:match>

That will read the html and serialise it as xml.

Now add our own transformer ...
<map:match pattern="**/dirtyhtml/**.html">
 <map:generate src="{project:content.xdocs}{1}/dirtyhtml/{2}.html" />
 <map:transform src="{project:resources.stylesheets}/stripContent-to-html.xsl" />
 <map:serialize type="xml"/>
</map:match>

It will get only the "div class=content" and transform that to plain
html. Get the example from the attachment to FOR-775 [1]
to src/documentation/resources/stylesheets/stripContent-to-html.xsl

That should now produce only the html fragment that
you are interested in.

Now add the standard html-to-document transformer.

<map:match pattern="**/dirtyhtml/**.html">
 <map:generate src="{project:content.xdocs}{1}/dirtyhtml/{2}.html" />
 <map:transform src="{project:resources.stylesheets}/stripContent-to-html.xsl" />
 <map:transform src="{forrest:stylesheets}/html-to-document.xsl"/>
 <map:serialize type="xml"/>
</map:match>

The output will now be in the internal xdoc format.

Now stop matching the .html extension and use .xml
and serialise it as the forrest internal format
i.e. adds the proper DOCTYPE so that the forrest
internal machinery will deal with it. So this
is the final match for your sitemap ...

<map:match pattern="**/dirtyhtml/**.xml">
 <map:generate src="{project:content.xdocs}{1}/dirtyhtml/{2}.html" />
 <map:transform src="{project:resources.stylesheets}/stripContent-to-html.xsl" /> 
 <map:transform src="{forrest:stylesheets}/html-to-document.xsl"/>
 <map:serialize type="xml-document"/>
</map:match>

The above stuff will probably need refinement, e.g.
the XSL could be improved and the sitemap could use
the new locationmap.

[1] http://issues.apache.org/jira/browse/FOR-775

> It's the first entry in the <pipelines> section. Bearing in mind what
> you said about the directory separators I tried a few variations on
> the syntax.i

Hmmm, i didn't say anything about directory separators.
Forrest always uses URLs, so even a file:/// local URL
has slashes, not back-slashes.

> I found the result either passed the html page straight
> through, which I assume means that the match isn't being made,

It was probably doing as instructed :-)

> or
> produced the following error:
> 
> test\src\documentation\content\xdocs\dirtyhtml\default.body.html (The
> system cannot find the file specified)
> 
> This happened when I used the code above.
> 
> As a matter of interest, how would one extend the match to include
> files with .htm and .asp extensions?

Have a look at the whiteboard/plugins/org.apache.forrest.plugin.output.php
for an example.

-David

> On 18/12/05, David Crossley wrote:
> > David Crossley wrote:
> > > Paul Bolger wrote:
> > > > I've been trying to get this to work, and I'm not sure what's going
> > > > wrong. I'll explain what I'd like to be able to do: I'd like to point
> > > > at a directory, and it's subdirectories, processing all html files so
> > > > that all content outside a #content div is stripped.
> > >
> > > Ah, that comment indicates a basic misunderstanding
> > > about how Cocoon operates. It doesn't actually process
> > > directories [1]. Rather it handles requests. Depending
> > > on the components of the URL, the sitemap will respond
> > > by matching certain patterns.
> > >
> > > You need a project sitemap (or plugin if it is common
> > > functionality) to intercept the specific matches that
> > > you want to transform. Any matches that remain are handled
> > > by the guts of forrest.
> > >
> > > Some of our documentation explains how to handle specific
> > > matches. As usual our docs need attention. This doc
> > > is close, but you need to wade through the example that
> > > it points to, because only part of that is relevant.
> > > http://forrest.apache.org/docs/project-sitemap.html
> > >
> > > Basically you need a project sitemap.xmap like this
> > > where "this-tree" is the directory tree to which
> > > you want to apply special processing ...
> > >
> > > <map:match pattern="**/this-tree/**.xml">
> > >  <map:generate src="{project:content.xdocs}{1}/this-tree/{2}.html" />
> > >  <map:transform src="{project:resources.stylesheets}/myStripContent-to-document.xsl"
/>
> > >  <map:serialize type="xml"/>
> > > </map:match>
> >
> > Of course, that should be <map:serialize type="xml-document"/>
> >
> > Also your "myStripContent" transformer could probably
> > just remove the bits that you don't want and then follow
> > it with the forrest html transformer. So ...
> >
> > <map:match pattern="**/this-tree/**.xml">
> >  <map:generate src="{project:content.xdocs}{1}/this-tree/{2}.html" />
> >  <map:transform src="{project:resources.stylesheets}/myStripContent-to-html.xsl"
/>
> >  <map:transform src="{forrest:stylesheets}/html2document.xsl"/>
> >  <map:serialize type="xml-document"/>
> > </map:match>
> >
> > > (Caveat: Be careful with those directory separators
> > > in the match and generate components: The ** will match
> > > a slash. I just added the above for readability.)
> > >
> > > In other words, presume that the request is
> > > localhost:8888/some-dir/this-tree/foo/bar.html
> > > then your sitemap would fire and it would generate
> > > xml content from xdocs/some-dir/this-tree/foo/bar.html
> > > and apply your transformer to produce the forrest
> > > internal document structure.
> > >
> > >                   --oOo--
> > >
> > > [1] Preparing a directory listing, say for a table
> > > of contents page is another matter. For that you
> > > would use more complex Cocoon sitemap operations.
> > > See DirectoryGenerator which traverses the directory
> > > tree generates an xml fragment. Apply a Transformer
> > > to that to turn it into forrest internal xml format.
> > >
> > > You would need to follow Cocoon sitemap docs. Start at
> > > http://forrest.apache.org/docs/project-sitemap.html
> > > Understand sitemaps and then see:
> > > http://cocoon.apache.org/2.1/userdocs/directory-generator.html
> > >
> > > We need to add an example to our seed-sample site.
> > >
> > > > This How-To is
> > > > very detailed and I've learnt a lot from it, but it'd be good to have
> > > >
> > > > a. and example file of sitemap.xmap with the extra element included (I
> > > > can't find the place that it's supposed to go...)
> > > >
> > > > and
> > > >
> > > >  b. an example xsl file.
> > >
> > > The stylesheet to strip everything except "div class=content"
> > > is a simple XSLT operation. Not apporpriate for this list.
> > > The "XSL FAQ" is a fantanstic resource http://www.dpawson.co.uk/xsl/
> > > and get Micahel Kay's book.
> > >
> > > -David
> >
> 
> 
> --
> Paul Bolger
> 19 Raggatt St
> Alice Springs
> NT 0870
> 08 8953 6780

Mime
View raw message