forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Savory <and...@luminas.co.uk>
Subject Re: Sitemap woes and semantic linking (Re: URI spaces: source, processing, result)
Date Thu, 12 Dec 2002 12:07:34 GMT

On Thu, 12 Dec 2002, Jeff Turner wrote:

> Yes I agree.  Having the _whole_ URI space (including javadocs) mapped in
> the sitemap would be really nice.  The overhead of a <map:read> for every
> Javadoc page probably wouldn't be noticed in a live webapp.  But for the
> command-line?  Imagine how long it would take for the crawler to grind
> through _every_ Javadoc page, effectively coping it unmodified from A to
> B.

I guess on the plus side, everything is still controlled in one place, and
since it's on the command line, it can be automated. The downside, as you
mention, is speed. But is Cocoon significantly slower doing a map:read
than, say, a "cp" on the command-line? What sort of factor of trade-off
are we talking about?

> IMO, the _real_ problem is that the sitemap has been sold as a generic
> URI management system, but it works at the level of a specific XML
> publishing tool.  It's scope is overly broad.

Again, it's a pro/con kind of argument: I *like* that everything is dealt
with within the Cocoon sitemap: my httpd/servlet engines are
interchangeable, but Cocoon is a constant.

> So where does Forrest stand?  We have servlet containers with wholly
> inadequate URI mapping.  We have Cocoon, trying to handle requests for
> binary content which it shouldn't, resulting is hopeless performance.  We
> have httpd, with good URI handling (eg mod_rewrite), but whose presence
> can't be relied upon.  What is the way out?

Well, one solution might be to split the sitemap (URI mapping) from
the sitemap (URI handling), and have a separate URI daemon that can run in
front of Cocoon (and in front of httpd, Tomcat, etc too). This seems kinda
drastic though, and could lead to a tangled mess of rewrites at each
stage.

> There is some context you're missing there..
>
> http://marc.theaimsgroup.com/?l=forrest-dev&m=103097808318773&w=2

Ok, gotcha. That seems fair, apologies for rehashing old discussions.

> A more current example of this principle: say we want to link to class
> MyClass:  <link href="java:org.apache.foo.MyClass">.  Now say we have
> Javadoc, UML and qdox representations of that resource.  Should we invent
> three new protocols; javadoc:, uml: and qdox:, or should we add a 'type'
> attribute specifying a MIME type (inventing one if we have to)?

Hrm, ok. But if we have javadoc it is going to be HTTP/HTML, so why
javadoc: as a protocol? Come to think of it, why java: as a protocol? If
the part of any href before a colon refers to the transport, is it right
to effectively overload the transport with additional MIME type
information? (That's not to say I'm in favour of the +uml notation
either... do we need another attribute?)


Andrew.

-- 
Andrew Savory                                Email: andrew@luminas.co.uk
Managing Director                              Tel:  +44 (0)870 741 6658
Luminas Internet Applications                  Fax:  +44 (0)700 598 1135
This is not an official statement or order.    Web:    www.luminas.co.uk


Mime
View raw message