forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Turner <je...@apache.org>
Subject Re: XHTML 2 intermediate format (Re: Letting through raw HTML)
Date Sat, 25 Jan 2003 08:30:35 GMT
On Fri, Jan 24, 2003 at 01:56:44PM -0800, Miles Elam wrote:
...
> >It's also pretty good as a 'source' format too.  Non-proprietary,
> >politically neutral, familiar to users..
> >
> How proprietary is too proprietary?

Any format used by only one tool is too proprietary.

> DocBook is backed by the non-profit group OASIS
> (http://www.oasis-open.org/).  Many of the W3C specs had strong
> influence from industry heavyweights.  (Wasn't CSS originally a
> Microsoft proposal?)  When you say "politically neutral," I think of
> things like XML Schema.

Ew :)

> Familiarity to users is indeed an issue, but most web designers don't
> use XHTML 1.0 yet let alone the backwards-incompatible XHTML2 which
> does away with <br>, <img>, <h1> - <h6>, requires the use of
CSS for
> display styling, etc.  The W3C is more popular than OASIS, but then
> Microsoft is more popular than the W3C.  How big is big enough?

In my mind, bigger than Apache is 'big enough'.

> XHTML2 and Simplified DocBook are deceptively close in many respects.  
> For example, where in XHTML2 you would write
> 
>  <section>
>    <h>Section Title</h>
>    <p>section content</p>
>    <section>
>      <h>Subsection Title</h>
>      <p>subsection content</p>
>    </section>
>  </section>
> 
> in DocBook you would write
> 
>  <section>
>    <title>Section Title</title>
>    <para>section content</para>
>    <section>
>      <title>Subsection Title</title>
>      <para>subsection content</para>
>    </section>
>  </section>
> 
> The stylesheets to convert between the two in this case is trivial.  But 
> XHTML lacks many items in DocBook especially with regard to meta 
> information.  As an example
> 
>  <article>
>    <articleinfo>
>      <title>Why I like DocBook</title>
>      <subtitle>Although XHTML2 isn't bad either</subtitle>
>      <pubdate>2003-01-24T12:34:00-08:00</pubdate>
>      <authorgroup>
>        <author>
>          <firstname>John</firstname>
>          <surname>Doe</surname>
>          <honorific>PhD</honorific>
>          <affiliation>DocBook Examples, Inc.</affiliation>
>          <jobtitle>Example fodder</jobtitle>
>          <email>jdoe@imaginary.com</email>
>        </author>
>        <author>
>          <firstname>Miles</firstname>
>          <surname>Elam</surname>
>          <email>miles@avoidingspamharvesting.com</email>
>        </author>
>      </authorgroup>
>      <copyright>
>        <year>2002</year>
>        <holder>Miles Elam</holder>
>      </copyright>
>      <legalnotice>
>        The content presented here is the property of DocBook Examples, Inc.
>        Duplication without written consent is forbidden.
>      </legalnotice>
>      <revhistory>
>        <revision>
>          <revnumber>1.0</revnumber>
>          <date>2003-01-24</date>
>          <authorinitials>ME</authorinitials>
>          <revremark>Initial Revision</revremark>
>        </revision>
>        <revision>
>          <revnumber>1.1</revnumber>
>          <date>2003-01-24</date>
>          <authorinitials>ME</authorinitials>
>          <revremark>Fixed well-formedness errors and made spelling 
> corrections</revremark>
>        </revision>
>      </revhistory>
>      <abstract>
>        <para>A full example of the benefits (drawbacks?) of using 
> Simplified DocBook</para>
>      </abstract>
>      <keywordset>
>        <keyword>simplified docbook</keyword>
>        <keyword>docbook</keyword>
>        <keyword>middle tier</keyword>
>        <keyword>meta information</keyword>
>        <keyword>semantic content</keyword>
>      </keywordset>
>    </articleinfo>
>    <!-- *snip content* -->
>  </article>
> 
> Going down the list, title is obviously handled by (X)HTML and items 
> such as subtitle, pubdate, legalnotice can be handled roughly with a 
> series of meta tags (assuming of course that meta names don't conflict 
> with browser display behavior).  Legal notices are commonly held in the 
> final XSLT transformation for site-wide consistency.  Then again, with 
> things like an abstract and a revision history (either manually entered 
> or if the document is pulled from CVS or some CMS backend), XHTML falls 
> short.  You could specify a "class" attribute to the first section 
> specifying that it's an abstract, of course.  And this assumes that 
> people go through the effort of entering the extra metadata in the first 
> place.  Then again, not every tag in DocBook needs to be used.  DocBook 
> also has references published under the Free Documentation License like 
> this (http://www.docbook.org/tdg/simple/en/html/sdocbook.html) for its 
> various elements so you wouldn't be in the same boat found now.  
> (Granted that XHTML2 is likely to have far more articles, books, and 
> tutorials in the future.)
> 
> In the end, with first tiers like Wiki, you most likely won't have this 
> meta information, but since only a small subset of XHTML2 would be used 
> as well, it's a wash.  If DocBook is your start and XHTML is your lingua 
> franca, you lose information before you get to your presentation layer 
> (meta tags don't display on the page) or it loses it's semantic meaning 
> (just another bunch of <p> tags in the body).  Once again, you have the 
> option of using ids and classes to simulate it, but do you want the CSS 
> stylesheets dependant upon definitions in the middle tier when there's 
> another transformation(s) coming?  There's a difference between starting 
> with a limited set of information and limiting your set of information.
> 
> In addition, XHTML is strictly tailored to web display (not necessarily 
> a bad thing), but it limits your choices for alternate display.  There 
> are HTML to FO and HTML to PDF converters, but as things move further 
> away from <font> and <i> tags, these tools that don't understand CSS 
> will make those output PDFs quite bland and sometimes unusable.  If you 
> are going to have to put some extra legwork for XHTML2 + CSS to PDF 
> anyway, it doesn't save much effort over Simplified DocBook.  And full 
> DocBook lends itself well to complete compilations (aggregation of 
> articles and notes into volumes and books) whereas XHTML does not; in 
> other words, there's a clear migration path for the future if needs and 
> functionality becomes more complex.
> 
> Also, DocBook has reference XSL stylesheets for output to both HTML and 
> XSL:FO and instructions for customization 
> (http://docbook.sourceforge.net/release/xsl/current/doc/).
> 
> >- XML + CSS
> >
> As before, once you dump some of your semantic meaning, this becomes 
> more difficult.  Also, if you are already at XHTML2, why would you want 
> to fall back to a non-layout oriented markup as the final display step?


I think this thread illustrates why OSS works despite lack of formal
design work.  All these intelligent contributions forcing one to think :)


There are two issues here:

1) Assuming we have an intermediate format, is XHTML2 (or Docbook?)
suitable.
2) Is XHTML2 an appropriate 'source' format.


2) Doesn't matter for now.  I imagine we'd support both, or *at least*
Docbook.

For 1), I can't see how Docbook could make a decent intermediate format.
It's not designed for that.  It's too 'semantic'.  For example, say we
invent a source syntax for describing directory heirarchies:

<dir id="somedir">
  <file id="README.txt" desc="README file"/>
  <file id="build.xml" desc="Ant build file"/>
  <dir id="src">
    <dir id="java" desc="Java Source code">
    </dir>
  </dir>
</dir>

How can we possibly transform this into Docbook?

Forrest's doc-v11 format suffers the same problem.  We resort to abusing
tags like <code> and <table> to indicate a certain presentation.

Whatever the intermediate format it, it must contain *less* semantics and
*more* presentation than source formats.  However it cannot contain more
'presentation' than the destination format (HTML, PDF), so it cannot be
something like XSLFO.  Our intermediate format must sit in the middle of
a gradient:


SEMANTIC                                        PRESENTATIONAL

authors
HR-XML                                
Docbook                       /---> HTML
doc-v11   >---->  Intermediate
myformat                      '---> XSL:FO: ---> PDF
...


So, what XML format can encapsulate the presentational aspects of all our
'source' formats (resumes, project docs, user manuals, etc) yet isn't
*too* well defined that we can't transform it into HTML and XSL:FO?

I think XHTML2 is the best candidate.

My understanding is that XHTML 1.1 and above are broken into modules, and
it is possible to cleanly extend XHTML by adding new modules (eg SVG).
So for example, if we wanted to include metadata, we'd throw some RDF
into the <head> tag and call it a module.  As an intermediate format,
XHTML2 would be just a base which to build.


--Jeff


> Anyway, there's my petition for Simplfied DocBook in the middle tier.
> 
> - Miles
> 
> 

Mime
View raw message