forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Crossley <cross...@apache.org>
Subject Re: [RT] crawl our dynamic forrest rather than commandline
Date Fri, 02 Sep 2005 03:25:04 GMT
Ross Gardler wrote:
> Nicola Ken Barozzi wrote:
> >David Crossley wrote:
> >
> >>We would rather use Forrest in dynamic mode
> >>so that we do not need to worry about the
> >>filename extensions in the output space and
> >>take more advantage of the Cocoon facilities
> >>like "Cocoon views" etc.
> >>
> >>However, we must be able to produce a static
> >>set of documents. That constrains us to the
> >>filename extension thing.
> >>
> >>Would it be possible to use an external tool
> >>like "wget" or maybe Apache Ant, to crawl a local
> >>Forrest server and detect the mime-types and create
> >>the set of files, appending the appropriate extension?
> >>
> >>That is just a wild thought, but so many times
> >>i read back through our mail archives, and see
> >>us hindered by this need to stick with the
> >>filename extensions and limit our use of Cocoon.
> >>Our design decisions are hampered.
> >
> >I'm sure (I have seen the code) that Cocoon CLI has been thought to be
> >able to crawl also links with ?a=b parameters in it, although I have
> >never tried it.
> 
> I have tried it, the problem is that a '?' is not legal in a file name 
> on some platforms. So it gets converted to a '_' (I think, it's that 
> anyway, can't remember exactly). As a result anything with a parameter 
> breaks the filenames.
> 
> I have no idea how something like wget does it.

Perhaps i am not explaining my concept very well.

Anyway, i have a new one. Forget wget and use
our own Cocoon capabilities.

I wonder if we can make a special pipeline in Forrest
that does the following:
* crawls the dynamic server (i.e. crawls itself)
* determines each file type (by using the mime-type
that Forrest indicates and perhaps also a map of hints)
* transforms each document to rewrite the
links (e.g. howto/foobar => howto/foobar.html)
* use the Cocoon SourceWritingTransformer to
write each file to disk with the relevant filename
extension.

-David

Mime
View raw message