forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicola Ken Barozzi <>
Subject Re: Sitemap woes and semantic linking (Re: URI spaces: source, processing, result)
Date Thu, 12 Dec 2002 14:37:53 GMT

Andrew Savory wrote:
> On Thu, 12 Dec 2002, Jeff Turner wrote:
>>Yes I agree.  Having the _whole_ URI space (including javadocs) mapped in
>>the sitemap would be really nice.  The overhead of a <map:read> for every
>>Javadoc page probably wouldn't be noticed in a live webapp.  But for the
>>command-line?  Imagine how long it would take for the crawler to grind
>>through _every_ Javadoc page, effectively coping it unmodified from A to
> I guess on the plus side, everything is still controlled in one place, and
> since it's on the command line, it can be automated. The downside, as you
> mention, is speed. But is Cocoon significantly slower doing a map:read
> than, say, a "cp" on the command-line? What sort of factor of trade-off
> are we talking about?

The actual problem is the CLI Cocoon, that crawles links.
The server version does not have this problem. So it's a CLI issue, not 
a Cocoon issue.

>>IMO, the _real_ problem is that the sitemap has been sold as a generic
>>URI management system, but it works at the level of a specific XML
>>publishing tool.  It's scope is overly broad.
> Again, it's a pro/con kind of argument: I *like* that everything is dealt
> with within the Cocoon sitemap: my httpd/servlet engines are
> interchangeable, but Cocoon is a constant.

Not only. Cocoon is *not* a servlet app. It's an XML processing engine. 
So it should manage everything it serves, so that its apps can be ported 
to every environment Cocoon can run in.

>>So where does Forrest stand?  We have servlet containers with wholly
>>inadequate URI mapping.  We have Cocoon, trying to handle requests for
>>binary content which it shouldn't, resulting is hopeless performance.  We
>>have httpd, with good URI handling (eg mod_rewrite), but whose presence
>>can't be relied upon.  What is the way out?
> Well, one solution might be to split the sitemap (URI mapping) from
> the sitemap (URI handling), and have a separate URI daemon that can run in
> front of Cocoon (and in front of httpd, Tomcat, etc too). This seems kinda
> drastic though, and could lead to a tangled mess of rewrites at each
> stage.

Exactly. These problems are not necessary bady things that Cocoon has 
but bugs or missing features. We should not circumvent them with hacks, 
but be able to manage them better in Cocoon.

>>There is some context you're missing there..
> Ok, gotcha. That seems fair, apologies for rehashing old discussions.
>>A more current example of this principle: say we want to link to class
>>MyClass:  <link href="">.  Now say we have
>>Javadoc, UML and qdox representations of that resource.  Should we invent
>>three new protocols; javadoc:, uml: and qdox:, or should we add a 'type'
>>attribute specifying a MIME type (inventing one if we have to)?
> Hrm, ok. But if we have javadoc it is going to be HTTP/HTML, so why
> javadoc: as a protocol? Come to think of it, why java: as a protocol? If
> the part of any href before a colon refers to the transport, is it right
> to effectively overload the transport with additional MIME type
> information? (That's not to say I'm in favour of the +uml notation
> either... do we need another attribute?)

Nicola Ken Barozzi         
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)

View raw message