forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Noels <stev...@outerthought.org>
Subject Re: Semantic linking (Re: [VOTE] Usage of file.hint.ext convention)
Date Mon, 02 Sep 2002 15:26:11 GMT
Jeff Turner wrote:

<snip/>

> So, what's the difference between <link href="primer.html"> and <link
> href="primer" content-type="text/html">? The same difference as between
> "identifying a resource" and "identifying a resource representation".
> The gods of the web have deemed that they are separate concerns; that
> "resource" != "resource representation"; they have separate identifiers;
> one a URI, the other a MIME type. Trying to identify both in one "href"
> element is mixing concerns.
> 
> Ahem. So there you go :) I fondly imagine this sort of thinking was going
> through Steven's head when he -1'ed extra extensions in URIs.

(not really answering your mail, just attaching myself to the righteous 
thought-train in this thread ;-)

I must have been fondly out of my mind as usual, I guess, but here's a 
summarization of an hour of intense mind battle in our offices just now, 
being challenged by Marc (who is actually much smarter than me but has a 
problem with his hard drive organization, hence his need for two 
extensions):

There are three kind of sources being processed through Forrest's 
request space:

                  Name                                   URI

1) XML (xdoc, docbook, YourGrammar)                **.{rendition}
2) XML-isable non-XML (e.g. DTD documentation)     **.{hint}.{rendition}
3) non-XML sources (images, static HTML/PDF/etc)   **.{extension}
    (detected by wrapping the pipelines
    in a ResourceExistAction)

{rendition} being html, pdf, wml, svg, ...
{hint} being dtdx ...

Examples:

1) manual/users/concepts.html
    pressreleases/2001-02-06.pdf

2) dtdx/document-v11.html
    /09/11/23.downloadstats.svg

3) architecture.png
    dist/forrest-src.tgz

That being said, I believe we can set up a sitemap (*the* Forrest 
sitemap, which is the definitive reference for the URI space being 
processed by Forrest) that handles these three types of sources with 
only minimal prextensination [1] of our URI space.

1) Using CAPs, we are able to describe how XML sources, dependant on 
their grammar must be preprocessed to conform to the intermediate 
format. People will be able to link to a named XML document, irrelevant 
of the preproceesing required, using <link 
href="path/name(.{rendition})"/> (and I must still read Jeff's analysis 
of the merits of having an extension in the href linking attribute).

We were thinking along the lines of a configuration section in the 
sitemap listing possible identifiers to assign documents to a certain 
'document class': public identifier, root element name, 
xsi:SchemaLocation attribute,...

Configuration of the pipeline would then be done in a CAPAction, setting 
sitemap parameters, i.e. selecting the correct 
authoringformat2intermediateformat.xsl - I will expand on this if the 
dust in my mind has settled (and Bruno has defined his implementation 
strategy ;-)

The pipeline is basically divided in two parts: pre- and 
post-intermediate format. The pre-IMF should not be 'visible' for the 
document editor: he just authors a document using a certain grammar and 
stores it on disk. The post-IMF contains the skinning, TOC aggregation, 
etc...

Rendition finally is specified using the extension, and is part of the 
post-IMF process (= part of the document author concern when creating a 
link).

2) Given the hint, the pipeline can be especially configured, i.e. 
setting the Generator type to nekodtd for a DTD source - the rendition 
is specified using the extension like in 1). The XML orginating from 
those sources can than be subject to CAP-processing.

3) For non-XML sources, there is a ResourceExistAction wrapping the all 
this checking if the resource being requested already exists on disk, 
and if so, using its extension, <map:read>'s it to the browser/crawler.

OK - this is only a short summary but I hope it is clear. Do we move 
forward with this?

Regards,

</Steven>

________
[1] non-existant English word for the use of double extensions to 
identify the type of resources, origin: 'prextension', i.e. an extension 
before the real extension. Also called: hint ;-)

-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
stevenn@outerthought.org                      stevenn@apache.org


Mime
View raw message