forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Miles Elam <mi...@geekspeak.org>
Subject Re: XHTML 2 intermediate format (Re: Letting through raw HTML)
Date Fri, 24 Jan 2003 21:56:44 GMT
Jeff Turner wrote:

>On Fri, Jan 24, 2003 at 09:38:13AM +0100, Nicola Ken Barozzi wrote:
>
>  
>
>>The solution IMHO would be to switch to XHTML. It doesn't have sections? 
>>I had proposed to follow XHTML2 which has them, and has all HTML features.
>>    
>>
>XHTML 2 sounds like the best bet for an intermediate format, because:
>
> - It's structurally closest to HTML, so the xhtml22html.xsl stylesheet
>   would be simple.
> - There's already Docbook -> XHTML stylesheets, so supporting Docbook as
>   a source format should be quite easy.
>
>It's also pretty good as a 'source' format too.  Non-proprietary,
>politically neutral, familiar to users..
>
How proprietary is too proprietary?  DocBook is backed by the non-profit 
group OASIS (http://www.oasis-open.org/).  Many of the W3C specs had 
strong influence from industry heavyweights.  (Wasn't CSS originally a 
Microsoft proposal?)  When you say "politically neutral," I think of 
things like XML Schema.  Familiarity to users is indeed an issue, but 
most web designers don't use XHTML 1.0 yet let alone the 
backwards-incompatible XHTML2 which does away with <br>, <img>, <h1> - 
<h6>, requires the use of CSS for display styling, etc.  The W3C is more 
popular than OASIS, but then Microsoft is more popular than the W3C.  
How big is big enough?

XHTML2 and Simplified DocBook are deceptively close in many respects.  
For example, where in XHTML2 you would write

  <section>
    <h>Section Title</h>
    <p>section content</p>
    <section>
      <h>Subsection Title</h>
      <p>subsection content</p>
    </section>
  </section>

in DocBook you would write

  <section>
    <title>Section Title</title>
    <para>section content</para>
    <section>
      <title>Subsection Title</title>
      <para>subsection content</para>
    </section>
  </section>

The stylesheets to convert between the two in this case is trivial.  But 
XHTML lacks many items in DocBook especially with regard to meta 
information.  As an example

  <article>
    <articleinfo>
      <title>Why I like DocBook</title>
      <subtitle>Although XHTML2 isn't bad either</subtitle>
      <pubdate>2003-01-24T12:34:00-08:00</pubdate>
      <authorgroup>
        <author>
          <firstname>John</firstname>
          <surname>Doe</surname>
          <honorific>PhD</honorific>
          <affiliation>DocBook Examples, Inc.</affiliation>
          <jobtitle>Example fodder</jobtitle>
          <email>jdoe@imaginary.com</email>
        </author>
        <author>
          <firstname>Miles</firstname>
          <surname>Elam</surname>
          <email>miles@avoidingspamharvesting.com</email>
        </author>
      </authorgroup>
      <copyright>
        <year>2002</year>
        <holder>Miles Elam</holder>
      </copyright>
      <legalnotice>
        The content presented here is the property of DocBook Examples, Inc.
        Duplication without written consent is forbidden.
      </legalnotice>
      <revhistory>
        <revision>
          <revnumber>1.0</revnumber>
          <date>2003-01-24</date>
          <authorinitials>ME</authorinitials>
          <revremark>Initial Revision</revremark>
        </revision>
        <revision>
          <revnumber>1.1</revnumber>
          <date>2003-01-24</date>
          <authorinitials>ME</authorinitials>
          <revremark>Fixed well-formedness errors and made spelling 
corrections</revremark>
        </revision>
      </revhistory>
      <abstract>
        <para>A full example of the benefits (drawbacks?) of using 
Simplified DocBook</para>
      </abstract>
      <keywordset>
        <keyword>simplified docbook</keyword>
        <keyword>docbook</keyword>
        <keyword>middle tier</keyword>
        <keyword>meta information</keyword>
        <keyword>semantic content</keyword>
      </keywordset>
    </articleinfo>
    <!-- *snip content* -->
  </article>

Going down the list, title is obviously handled by (X)HTML and items 
such as subtitle, pubdate, legalnotice can be handled roughly with a 
series of meta tags (assuming of course that meta names don't conflict 
with browser display behavior).  Legal notices are commonly held in the 
final XSLT transformation for site-wide consistency.  Then again, with 
things like an abstract and a revision history (either manually entered 
or if the document is pulled from CVS or some CMS backend), XHTML falls 
short.  You could specify a "class" attribute to the first section 
specifying that it's an abstract, of course.  And this assumes that 
people go through the effort of entering the extra metadata in the first 
place.  Then again, not every tag in DocBook needs to be used.  DocBook 
also has references published under the Free Documentation License like 
this (http://www.docbook.org/tdg/simple/en/html/sdocbook.html) for its 
various elements so you wouldn't be in the same boat found now.  
(Granted that XHTML2 is likely to have far more articles, books, and 
tutorials in the future.)

In the end, with first tiers like Wiki, you most likely won't have this 
meta information, but since only a small subset of XHTML2 would be used 
as well, it's a wash.  If DocBook is your start and XHTML is your lingua 
franca, you lose information before you get to your presentation layer 
(meta tags don't display on the page) or it loses it's semantic meaning 
(just another bunch of <p> tags in the body).  Once again, you have the 
option of using ids and classes to simulate it, but do you want the CSS 
stylesheets dependant upon definitions in the middle tier when there's 
another transformation(s) coming?  There's a difference between starting 
with a limited set of information and limiting your set of information.

In addition, XHTML is strictly tailored to web display (not necessarily 
a bad thing), but it limits your choices for alternate display.  There 
are HTML to FO and HTML to PDF converters, but as things move further 
away from <font> and <i> tags, these tools that don't understand CSS 
will make those output PDFs quite bland and sometimes unusable.  If you 
are going to have to put some extra legwork for XHTML2 + CSS to PDF 
anyway, it doesn't save much effort over Simplified DocBook.  And full 
DocBook lends itself well to complete compilations (aggregation of 
articles and notes into volumes and books) whereas XHTML does not; in 
other words, there's a clear migration path for the future if needs and 
functionality becomes more complex.

Also, DocBook has reference XSL stylesheets for output to both HTML and 
XSL:FO and instructions for customization 
(http://docbook.sourceforge.net/release/xsl/current/doc/).

>- XML + CSS
>
As before, once you dump some of your semantic meaning, this becomes 
more difficult.  Also, if you are already at XHTML2, why would you want 
to fall back to a non-layout oriented markup as the final display step?

Anyway, there's my petition for Simplfied DocBook in the middle tier.

- Miles



Mime
View raw message