incubator-jspwiki-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Murray Altheim <murra...@altheim.com>
Subject Re: JSPWiki to DocBook
Date Tue, 24 Feb 2009 07:55:29 GMT
Frank Jennings wrote:
> Dear all,
> 
> I was searching the list for information on producing structured content 
> from the wiki pages. I couldn't find any.
> 
> I developed this small standalone tool to produce DocBook content from 
> the JSPWiki pages:
> http://code.google.com/p/wits-parser/
> 
> Read Me is here:
> http://code.google.com/p/wits-parser/wiki/ReadMe
> 
> I don't know if it will be of any use to people in this list. I would 
> like to know if you really have a strong business case for converting 
> wiki to structured documents.

Hi Frank,

When I was still at Sun we did a lot of DocBook and HTML/XHTML stuff,
as Sun's documentation is largely in DocBook (well, a DocBook subset
called SolBook). So I know DocBook very well and have no criticisms
of its use.

When transforming DocBook to XHTML one loses much of the structure,
with the only reasonable way of maintaining some of it by populating
the 'class' attribute values of <div>, <span>, <p> and other block
elements to mimic the original DocBook element types. This is similar
to what people now call "microformats" (i.e., it was done many years
before that term was coined). You could of course transform all of
DocBook to simply <div> and <span> elements with the 'class' attributes
being the original DocBook element types and a CSS stylesheet to suit.
This would in effect be more appropriate than the tag abuse of forcing
DocBook's semantics into XHTML's. But HTML/XHTML has such a long
history of abuse that its semantics aren't very strong anyway, in
terms of normative practice.

One of the issues with transforming XHTML to DocBook is that one has
almost no structure to work with. There's none of the containment and
almost none of the required sequences or optional structures one finds
in DocBook. It's going from chaos to structure, and implying structure
where none is extant is a bit of tag abuse as well. With the wiki the
markup is at least a bit more regularized since it is itself a
transformation from the wiki markup. We can imply *some* of the
structures.

What I *might* recommend is looking at transforming the XHTML output
of JSPWiki into a tighter XHTML-based document type. If you look at
what is available in ISO HTML the design is actually somewhat similar
to DocBook, i.e., there's a set of numbered divisions (<DIV1> through
<DIV6>) with numbered headings for each. This is about as much real
structure as one finds in HTML/XHTML anyway and there's no tag abuse.

   Information technology — Document description and processing
   languages — HyperText Markup Language (HTML). ISO/IEC 15445:2000(E)
   https://www.cs.tcd.ie/15445/15445.HTML

   User's Guide to ISO/IEC 15445:2000 HyperText Markup Language (HTML)
   https://www.cs.tcd.ie/15445/UG.HTML

The relevant part of the ISO HTML DTD is

   <!-- The following marked section is informative only -->
   <![ %Preparation; [
   <!ELEMENT Pre-HTML    - -  (HEAD, BODY) >
   <!ATTLIST Pre-HTML %i18n;  -- Internationalization DIR and LANG -->
   <!ELEMENT BODY        - O  ((%block;)*,(H1,DIV1)* ) +(DEL|INS) >
   <!ELEMENT H1          - -  (%text;)+ >
   <!ELEMENT DIV1        O O  ((%block;)*, (H2,DIV2)* ) >
   <!ELEMENT H2          - -  (%text;)+ >
   <!ELEMENT DIV2        O O  ((%block;)*, (H3,DIV3)* ) >
   <!ELEMENT H3          - -  (%text;)+ >
   <!ELEMENT DIV3        O O  ((%block;)*, (H4,DIV4)* ) >
   <!ELEMENT H4          - -  (%text;)+ >
   <!ELEMENT DIV4        O O  ((%block;)*, (H5,DIV5)* ) >
   <!ELEMENT H5          - -  (%text;)+ >
   <!ELEMENT DIV5        O O  ((%block;)*, (H6,DIV6)* ) >
   <!ELEMENT H6          - -  (%text;)+ >
   <!ELEMENT DIV6        O O  ((%block;)*) >
   ]]>

You can see how the divisions and headings mimic DocBook. The headings
could either precede the division or be the first child element. I
personally think ISO HTML should have put the heading inside of the
division since the heading is for that division. But no matter.

Now, I'm not actually suggesting use of ISO HTML since (a) it's SGML
rather than XML based, so it's incompatible with XHTML, and (b) it
uses uppercase element type names, and (c) I don't actually recommend
using <DIV1> through <DIV6> (possibly <div class="sect1"> through
<div class="sect6"> instead?). Point is, this can all be done within
the existing XHTML DTD.

If you actually wanted a more restrictive XHTML DTD for an output
structure mimicking ISO HTML's hierarchy, I'm willing to contribute
some time writing an XHTML module to do this (I might even have one
somewhere from when I did that work back in the late 90s). That is, if
you decided you wanted to do this and got to the point of needing it.

To answer your question more directly, we've been looking into an
archive format for content coming off the wiki and have considered
DocBook, but are more likely to go with validated XHTML since it
more closely fits with the semantics of the wiki's output markup.

Murray

...........................................................................
Murray Altheim <murray09 at altheim dot com>                       ===  = =
http://www.altheim.com/murray/                                     = =  ===
SGML Grease Monkey, Banjo Player, Wantanabe Zen Monk               = =  = =

       Boundless wind and moon - the eye within eyes,
       Inexhaustible heaven and earth - the light beyond light,
       The willow dark, the flower bright - ten thousand houses,
       Knock at any door - there's one who will respond.
                                       -- The Blue Cliff Record

Mime
View raw message