Return-Path: Delivered-To: apmail-incubator-jspwiki-user-archive@minotaur.apache.org Received: (qmail 84207 invoked from network); 24 Feb 2009 07:56:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 24 Feb 2009 07:56:07 -0000 Received: (qmail 46589 invoked by uid 500); 24 Feb 2009 07:56:06 -0000 Delivered-To: apmail-incubator-jspwiki-user-archive@incubator.apache.org Received: (qmail 46573 invoked by uid 500); 24 Feb 2009 07:56:06 -0000 Mailing-List: contact jspwiki-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jspwiki-user@incubator.apache.org Delivered-To: mailing list jspwiki-user@incubator.apache.org Received: (qmail 46562 invoked by uid 99); 24 Feb 2009 07:56:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Feb 2009 23:56:06 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of murray09@altheim.com designates 69.90.72.72 as permitted sender) Received: from [69.90.72.72] (HELO athens.clusterspan.net) (69.90.72.72) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Feb 2009 07:55:56 +0000 Received: (qmail 11465 invoked from network); 24 Feb 2009 02:55:35 -0500 Received: from ip-118-90-60-19.xdsl.xnet.co.nz (HELO ?192.168.1.64?) (118.90.60.19) by athens.clusterspan.net with (DHE-RSA-AES256-SHA encrypted) SMTP; 24 Feb 2009 02:55:34 -0500 Message-ID: <49A3A7F1.3080300@altheim.com> Date: Tue, 24 Feb 2009 20:55:29 +1300 From: Murray Altheim User-Agent: Thunderbird 2.0.0.19 (X11/20090105) MIME-Version: 1.0 To: jspwiki-user@incubator.apache.org Subject: Re: JSPWiki to DocBook References: <49A2B728.7050801@Sun.COM> In-Reply-To: <49A2B728.7050801@Sun.COM> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org Frank Jennings wrote: > Dear all, > > I was searching the list for information on producing structured content > from the wiki pages. I couldn't find any. > > I developed this small standalone tool to produce DocBook content from > the JSPWiki pages: > http://code.google.com/p/wits-parser/ > > Read Me is here: > http://code.google.com/p/wits-parser/wiki/ReadMe > > I don't know if it will be of any use to people in this list. I would > like to know if you really have a strong business case for converting > wiki to structured documents. Hi Frank, When I was still at Sun we did a lot of DocBook and HTML/XHTML stuff, as Sun's documentation is largely in DocBook (well, a DocBook subset called SolBook). So I know DocBook very well and have no criticisms of its use. When transforming DocBook to XHTML one loses much of the structure, with the only reasonable way of maintaining some of it by populating the 'class' attribute values of
, ,

and other block elements to mimic the original DocBook element types. This is similar to what people now call "microformats" (i.e., it was done many years before that term was coined). You could of course transform all of DocBook to simply

and elements with the 'class' attributes being the original DocBook element types and a CSS stylesheet to suit. This would in effect be more appropriate than the tag abuse of forcing DocBook's semantics into XHTML's. But HTML/XHTML has such a long history of abuse that its semantics aren't very strong anyway, in terms of normative practice. One of the issues with transforming XHTML to DocBook is that one has almost no structure to work with. There's none of the containment and almost none of the required sequences or optional structures one finds in DocBook. It's going from chaos to structure, and implying structure where none is extant is a bit of tag abuse as well. With the wiki the markup is at least a bit more regularized since it is itself a transformation from the wiki markup. We can imply *some* of the structures. What I *might* recommend is looking at transforming the XHTML output of JSPWiki into a tighter XHTML-based document type. If you look at what is available in ISO HTML the design is actually somewhat similar to DocBook, i.e., there's a set of numbered divisions ( through ) with numbered headings for each. This is about as much real structure as one finds in HTML/XHTML anyway and there's no tag abuse. Information technology — Document description and processing languages — HyperText Markup Language (HTML). ISO/IEC 15445:2000(E) https://www.cs.tcd.ie/15445/15445.HTML User's Guide to ISO/IEC 15445:2000 HyperText Markup Language (HTML) https://www.cs.tcd.ie/15445/UG.HTML The relevant part of the ISO HTML DTD is ]]> You can see how the divisions and headings mimic DocBook. The headings could either precede the division or be the first child element. I personally think ISO HTML should have put the heading inside of the division since the heading is for that division. But no matter. Now, I'm not actually suggesting use of ISO HTML since (a) it's SGML rather than XML based, so it's incompatible with XHTML, and (b) it uses uppercase element type names, and (c) I don't actually recommend using through (possibly
through
instead?). Point is, this can all be done within the existing XHTML DTD. If you actually wanted a more restrictive XHTML DTD for an output structure mimicking ISO HTML's hierarchy, I'm willing to contribute some time writing an XHTML module to do this (I might even have one somewhere from when I did that work back in the late 90s). That is, if you decided you wanted to do this and got to the point of needing it. To answer your question more directly, we've been looking into an archive format for content coming off the wiki and have considered DocBook, but are more likely to go with validated XHTML since it more closely fits with the semantics of the wiki's output markup. Murray ........................................................................... Murray Altheim === = = http://www.altheim.com/murray/ = = === SGML Grease Monkey, Banjo Player, Wantanabe Zen Monk = = = = Boundless wind and moon - the eye within eyes, Inexhaustible heaven and earth - the light beyond light, The willow dark, the flower bright - ten thousand houses, Knock at any door - there's one who will respond. -- The Blue Cliff Record