forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Turner <je...@apache.org>
Subject Re: Metadata
Date Sun, 10 Aug 2003 09:21:09 GMT
Hi Jason,

This stuff sounds great :)  I look forward to playing with it.

One thought: how about generalising this extra pipeline, and calling it
'metadata' or 'meta' instead of 'head'?

In your implementation, everything in the **head-* pipeline originates in
the XML <header> tag, and ends up in the HTML <head> tag.  Hence naming
the pipeline '**head-*' makes sense.  But I think we can generalize this:

- Not all metadata comes from the <header> tag.  For instance, we could:
  - fetch the page's 'Last Modified' timestamp from the filesystem.
  - poke CVS and obtain lots of info about a file from there
  - use intelligent software to parse the XML, infer what concepts are
    present in the page and automatically generate metadata [1]
  - Add a 'Creator' field, specifying the Forrest version used to create
    the page.

- Not all metadata is used solely in the HTML <head> tag.  I'd like to
  put the 'Last Modified' date in the page body, like Maven sites (see
  maven.apache.org) do.

So based on this, we could have a '**metadata-*.html' pipeline that
serves up XML conforming to a standard metadata format like Dublin
Core (http://dublincore.org/):

<metadata xmlns="http://apache/org/forrest/metadata/1.0"
  xmlns:dc="http://purl.org/dc/elements/1.1/">
  <dc:title>
    Essex Conservatories-Direct : The Local Answer To Your
    Conservatory Needs.
  </dc:title>
  <dc:creator>
    Apache Forrest 0.5
  </dc:creator>
  <dc:description>
    Essex, Quality conservatories and sunrooms direct and online. The
    Local Answer To Your Conservatory Needs. testing, 1, 2, 3, testing
    description
  </dc:description>
  <dc:publisher>
    YourCompany
  </dc:publisher>
  <dc:identifier>
    http://yourcompany.com/index.html
  </dc:identifier>
  <dc:language>en</dc:language>
  <dc:date>created: 2002-10-27; modified: 2002-09-20</dc:date>
</metadata>

There is a list of standard DC elements at
http://dublincore.org/documents/dces/.


--Jeff


[1] See http://directory.google.com/Top/Reference/Knowledge_Management/Knowledge_Retrieval/Classification/Software/?il=1
    I have used Klarity (http://archive.klarity.com.au/) before for this.

On Fri, Aug 08, 2003 at 03:38:31PM +0100, g4 wrote:
> Hi Jeff, how's it going?
> 
> OK I've been tackling this metadata issue we talked about. Just want to 
> make sure I'm heading in the right direction and get some feedback, so 
> this is what I've done:
> 
> OK so we have this as an example content XML page:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0a//EN" 
> "document-v20.dtd">
> <document>
> 	<header>
> 		<title>Essex</title>
> 		<!--<authors>
> 			<person name="Jeff Turner" email="jefft@apache.org"/>
> 		</authors>-->
> 		<meta name="keywords">testing, 1, 2, 3, testing 
> 		keyword</meta>
> 		<meta name="description">testing, 1, 2, 3, testing 
> 		description</meta>
> 	</header>
> 	<body>
> 		<section>
> 			<title>to go</title>
> 			<subtitle>The Local Answer To Your Conservatory 
> 			Needs.</subtitle>
> 			<tagline>Quality conservatories and sunrooms direct 
> 			and online.</tagline>
> 			<p>You have successfully generated and rendered an 
> 			<link href="ext:forrest">Apache Forrest</link> site. This page is from
the 
> site template. It is found in
> 			<code>my-site/src/documentation/content/xdocs/index.xml</code>
> 			Please edit it and replace this text with content of 
> 			your own.</p>
> 		</section>
> 	</body>
> </document>
> 
> 1) so I created a new sitemap.xmap resource called "head"
> 
> <map:resource name="head">
>       <map:transform src="skins/{forrest:skin}/xslt/html/{type}.xsl">
>         <!-- Can set an alternative project skinconfig here
>         <map:parameter name="config-file" 
> value="../../../../skinconf.xml"/>
>         -->
>          <map:parameter name="path" value="{path}"/>
>       </map:transform>
> 
>       <map:serialize/>
>     </map:resource>
> 
> 2) We then have a new pipeline, thus:
> 
> <!-- header -->
>        <map:match pattern="**head-*.html">
>         <map:generate src="cocoon:/{1}{2}.xml"/>
>         <map:transform type="linkrewriter" 
> src="cocoon:/{1}linkmap-{2}.html"/>
>         <map:call resource="head">
>           <map:parameter name="type" value="head2html"/>
>           <map:parameter name="path" value="{1}{2}.html"/>
>         </map:call>
>       </map:match>
> 
> 3) And then aggregate the whole lot:
> 
> <map:part src="cocoon:/head-{0}"/>
> 
> 4) I thought that transforming the head separately made a bit more 
> sense, my only concern is will it slow things down if we have large 
> content files and essentially the content is being parsed twice, no?, 
> anyway the XSL for this (head2html):
> 
> -->
> <xsl:stylesheet version="1.0" 
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
> 
> 	<xsl:param name="path"/>
> 	<xsl:include href="../../../common/xslt/html/dotdots.xsl"/>
> 	<xsl:include href="../../../common/xslt/html/pathutils.xsl"/>
> 
> 	<xsl:variable name="filename-noext">
> 		<xsl:call-template name="filename-noext">
> 			<xsl:with-param name="path" select="$path"/>
> 		</xsl:call-template>
> 	</xsl:variable>
> 	
> 	<xsl:variable name="root">
> 		<xsl:call-template name="dotdots">
> 			<xsl:with-param name="path" select="$path"/>
> 		</xsl:call-template>
> 	</xsl:variable>
> 	
> 	<xsl:template match="/">
> 		<head>
> 			<link rel="stylesheet" href="{$root}skin/main.css" 
> 			type="text/css"/>
> 		<xsl:apply-templates/>
> 		</head>
> 	</xsl:template>
> 	
> 	<xsl:template match="header">
> 		<xsl:apply-templates/>
> 	</xsl:template>
> 	
> 	<xsl:template match="title">
> 		<title><xsl:value-of select="."/> Conservatories-Direct : 
> <xsl:value-of select="//subtitle/."/></title>
> 	</xsl:template>
> 
> 	<xsl:template match="meta">
> 		<xsl:if test="@name='description'">
> 			<meta content="{//title/.}, {//tagline/.} 
> 			{//subtitle/.} {.}" name="{@name}"/>
> 		</xsl:if>
> 		<xsl:if test="@name='keywords'">
> 			<meta content="{//title/.},{.}" name="{@name}"/>
> 		</xsl:if>
> 	</xsl:template>
> 	
> 	<xsl:template match="body">
> 		<!-- ignore the <body/> part -->
> 	</xsl:template>
> 
> </xsl:stylesheet>
> 
> 
> 5) Finally we call the head from within "site2html",
> 
> ...
> <xsl:call-template name="head"/>
> ...
> <xsl:template name="head">
> 
> 	<xsl:comment>================= start Metadata items 
> ==================</xsl:comment>
> 	<xsl:apply-templates select="head"/>
> 	<xsl:comment>================= end Menu items 
> ==================</xsl:comment>
> 
> </xsl:template>
> 
> ....
> 
> This produces :
> 
> <head>
> <META http-equiv="Content-Type" content="text/html; charset=utf-8">
> <link type="text/css" href="../skin/main.css" rel="stylesheet">
> <title>Essex Conservatories-Direct : The Local Answer To Your 
> Conservatory Needs.</title>
> <meta name="keywords" content="Essex,testing, 1, 2, 3, testing keyword">
> <meta name="description" content="Essex, Quality conservatories and 
> sunrooms direct and online. The Local Answer To Your Conservatory 
> Needs. testing, 1, 2, 3, testing description">
> </head>
> 
> 
> I am in the process of working a character limit of the meta keyword 
> and description, this should stop tags from being over populated with 
> data should this ever arise.
> 
> Let me know if this is what you were thinking of, otherwise I can 
> re-work it ;) Also how would I go about submitting this, when it's 
> finished?
> 
> Kind regards
> 
> Jason Lane
> 

Mime
View raw message