forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Turner <>
Subject Sitemap woes and semantic linking (Re: URI spaces: source, processing, result)
Date Thu, 12 Dec 2002 11:32:58 GMT
On Thu, Dec 12, 2002 at 10:39:05AM +0000, Andrew Savory wrote:
> On Thu, 12 Dec 2002, Steven Noels wrote:
> > could you please comment on my summary, too? Also, I'd like to hear the
> > opinion of others.
> Ok, caveat: I've not used Forrest (yet), but I use Cocoon extensively.
> Jeff Turner wrote:
> > Are you really suggesting that requests for Javadoc pages should go
> > through Cocoon?
> >
> > But the problem is real: how do we integrate Javadocs into
> > the URI space.
> >
> > I'd say write out .htaccess files with mod_rewrite rules, and figure out
> > what the equivalent for Tomcat is.  Perhaps a separate servlet..
> > _anything_ but Cocoon ;P
> Whilst I understand your concern about passing 21mb of files through
> Cocoon untouched, I'm not sure there's a more elegant way of handling URI
> space issues, without ending up bundling a massive amount of software with
> Forrest (or making unrealistic software prerequisite installation
> demands).
> So, since Cocoon _can_ handle the rewriting concern, and is already in
> Forrest, why not use it?

Yes I agree.  Having the _whole_ URI space (including javadocs) mapped in
the sitemap would be really nice.  The overhead of a <map:read> for every
Javadoc page probably wouldn't be noticed in a live webapp.  But for the
command-line?  Imagine how long it would take for the crawler to grind
through _every_ Javadoc page, effectively coping it unmodified from A to

IMO, the _real_ problem is that the sitemap has been sold as a generic
URI management system, but it works at the level of a specific XML
publishing tool.  It's scope is overly broad.  The webserver (Tomcat)
should be defining the 'site map', and Cocoon should never even _see_
requests for static resources.  Just like mod_jk only forwards servlet
and JSP requests on to Tomcat, Tomcat should only forward requests for
XML processing on to Cocoon.  So <map:read> is a hack to handle requests
that Cocoon should never have been asked to handle in the first place.

So where does Forrest stand?  We have servlet containers with wholly
inadequate URI mapping.  We have Cocoon, trying to handle requests for
binary content which it shouldn't, resulting is hopeless performance.  We
have httpd, with good URI handling (eg mod_rewrite), but whose presence
can't be relied upon.  What is the way out?

> I like the idea of link naming schemes, but I'm really worried about the
> idea of specifying MIME types as link attributes. This seems like a nasty
> hack: should we be specifying MIME types?

There is some context you're missing there..

The theory is that links should _not_ specify MIME type of linked-to docs
by default.  The MIME type should be inferred by the type of the linking
document, and what's available.  Eg, <link href="site:/primer"> links to
"The Forrest Primer" in whatever form it's available.

However it is also sometimes desirable to specify the MIME type
explicitly.  So rather than corrupt our nice semantic URLs, eg <link
href="site:/primer.pdf">, we should express the type as a separate
attribute: <link href="site:/primer" type="application/pdf">.

A more current example of this principle: say we want to link to class
MyClass:  <link href="">.  Now say we have
Javadoc, UML and qdox representations of that resource.  Should we invent
three new protocols; javadoc:, uml: and qdox:, or should we add a 'type'
attribute specifying a MIME type (inventing one if we have to)?



> Andrew.
> -- 
> Andrew Savory                                Email:
> Managing Director                              Tel:  +44 (0)870 741 6658
> Luminas Internet Applications                  Fax:  +44 (0)700 598 1135
> This is not an official statement or order.    Web:

View raw message