xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject The need for a site-wide XML-based information system
Date Sun, 21 Nov 1999 00:39:32 GMT

I'm currently in the dirty process of moving Cocoon from java.apache to
xml.apache. Since both projects use the same approach for code
revisioning (CVS) that transition was not painful.

On the other hand, the documentation patterns that java.apache adopts
site-wide (which I personally designed) is based on simple yet very
effective contracts that helped reducing the overhead of site management
as well as didn't hurt the scalability of the project as a whole.

I would like to show you how this works today in java.apache and I would
also like to show you the reasons that broght me to write Cocoon in the
first place, because, yes, Cocoon was designed as a management tool for
the java.apache web site and grew up to become a collection of new
publishing patterns and ideas. Still, most of my experience comes in
handling such a centralized documentation system with distributed


The java.apache.org web site is composed by

1) a graphical and architectural framework of HTML documents, and
building scripts (in their own CVS module)

2) the HTML documents contained in the /docs directory of every hosted
CVS module

The idea is rather simple: each project works independently on their own
documents using HTML and whatever style they like (a very simple look
and feel was designed as a guideline but was not mandated). These
documents are the same distributed with the software.

When required, the scripts are manually executed on the hosting machine
and do a bunch of "cvs update" on the site and recreate the directory
structure which is something like this

/                <--- index.html and style frames
/main            <--- site own documents (like news, TOC, etc.)
/images          <--- site-wide images
/<project>       <--- each project has its own directory
/<project>/dist  <--- each project has its own distribution section

The scripts updates the site by (pseudo-shell code)

 cvs update site
 for each project
  cd project
  cvs update project/docs

problems with this approach

The first problems were due to my own esthetic needs: I came to know the
web from a graphic designer perspective and I think a web site (like any
other GUI) must be appealing to be functional but should also be
carefully tuned for speed and usability.

A lot of effort was put in making java.apache both appealing, usable,
easy to manage and capable of scaling. For this reason, the use of
frames allowed us to reduce style and linking contracts between pages
and HTML authors without requiring special template systems (like
Jakarta does, for example).

On the other hand, the use of a common look and feel required too much
graphical knowledge and too much overhead to be auto-mantainable in the
long term. So, everyone used very basic HTML tags that were codified as
the look and feel, almost as a style-pruned XML-ish HTML.

This system was designed 11 months ago and right after finishing it, I
started to write Cocoon as a way to move out of these style problems.
Pierpaolo's work on stylebook comes right after these conclusions that
we were sharing for the first months of Cocoon very basic operation
aiming at batch site generation while I was more tempted by XML-based
live web applications.

Moving to XML

Once you get the components done, doing an XML paradigm shift should be
piece of cake. Wrong.

You know that XML is nothing without a DTD and a DTD is useless without
an application that "recognizes it" or a transformation-sheet that
changes it into something that an application understands.

So, in order to XML-ize project docs, you need a DTD, hopefully, a
site-wide DTD so that stylesheets can be reused between projects and all
docs come to have the same look and feel. Currently, the xml.apache.org
web site is created using DTD that Pier defined which are very basic but
farely complete in a software documentation sense.

As you can see from xml.apache.org, the results can be rather
impressive, yet simple and straighforward to maintain for non-graphical
people and content owners.

The need for site-wide DTDs

DTDs should not change frequently and back-incompatible changes should
be reduced to null. Still, there must be a place where DTD changes are
discussed, voted and approuved. It must also be imposed on a site-wide
level the adoption of particular DTDs for documentation writing.

The problem of hard hyperlinks

Hyperlinks pose a significant problem. Suppose you have an XML fragment
like this

  <p>Here is the <link href="manual.xml">user manual</link></p>

it clearly indicates that "user manual" should be connected to the
"manual.xml" file. On the other hand, the manual.xml file could have
been transformed into "manual.html" to be served as "text/html" by the
web server.

It is clear that the use of HTML-style hyperlinking is not enough to
handle XML documentation. Unfortunately, the XLink spec is not enough
powerful to handle even these cases since all hyperlinks are considered
"hard" and immutable. Here is how I would do it:

 <p>Here is the <link xlink:href="manual.xml" 
                      xlink:mode="soft">user manual</link>

where the "xlink:mode" attributed was invented here to allow XLink
interpreters to distinguish between hard links (where href should not be
touched) and soft links (where href can be considered not as a URI but
as a key into a URI map)

This soft mode will allow document processors to "rewire" the documents
based on some site map that is also used to create the documents. Of
course, this attribued does not make sense on the client side since all
client-side-interpreted links must be considered "hard".

Moving Javadoc into XML

The Cocoon project is currently working on writing a JavaDOC DTD and
implementing an XML doclet to allow javadocs to be generated using XML
and without containing any style information.

This will allow inlined-code documentation to be XML and XSL processed
for styling, filtering, transformation or even more complex operation.
The plan is also to allow the inclusion of syntax highlighted source
code inside the javadocs to create a sort of "annotated code" with
highly visual appeal.

Also, it would be interesting to estimate the use of javadoc->XMI
transformations for direct UML-like diagrams, but this is not in our
plans at this point.


Information system pose the real scalability problem in current open
source projects. Document writes should have the lowest possible energy
gap to be full speed in minutes rather than days. Also, the project must
be able to scale as more people work on the documentation.

Also, good integration with services (webCVS, bugtracking, todolists) is
of incredible importance.

While other projects are creating the machinery and the needed web
applications (Brian's Tigris, Jon's Jyve), I would like to see this site
showing the power of XML for scalable web based information systems.

Along, of course, with the plan of integrating Stylebook batch
functionality with next generation of Cocoon.

So, please, let's discuss the items here so that we can start creating
and proposing those patterns that allow projects to operate coherently
between them.

Sorry for the long letter, but this is a really important aspect of our
work and, IMO, deserves such a long note.

Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<stefano@apache.org>                             Friedrich Nietzsche

View raw message