Mailing-List: contact general-help@xml.apache.org; run by ezmlm
Message-ID: <388EF70F.2B628982@apache.org>
Date: Wed, 26 Jan 2000 14:30:55 +0100
From: Stefano Mazzocchi <stefano@apache.org>
Organization: Apache Software Foundation
MIME-Version: 1.0
To: cocoon-dev@xml.apache.org, Apache XML <general@xml.apache.org>
Subject: Re: A better model for site generation
References: <388DC4FD.1593F21@apache.org> <388E09DB.21CA5691@apache.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Mike Pogue wrote:
> 
> Stefano Mazzocchi wrote:
> >
> > I've been playing around with Stylebook for the last two days to come up
> > with a better look and feel since our documents use more complex DTDs
> > and require more expressive power.
> >
> 
> Technically, look and feel should be the role of the Style, not the role
> of the DTD.

Yes, you're right, but you know what? after playing around with a bunch
of DTDs, trying to apply DTD inheritance using empty entities and such
(see below), I feel there is a _great_ need of design patterns for DTD
generation and, even more important, style abstraction.

HTML is everything but style abstract, even plain HTML + CSS2.

> However, if there is additional *semantic* tagging that you feel is
> needed, there's definitely the opportunity to add this.

I'd like to discuss briefly something with your guys, since it would
lead to interesting ideas and patterns.

In the Cocoon project we have 5 DTDs: document, spec, faqs, changes,
todo (and bugs in the making)

Document is the most complex one and it follows the W3C spec DTD which
is "extendible", in this sense:

<!ENTITY % local.markup "">
<!ENTITY % markup "strong|em|code|sub|sup%local.markup;">

So, if you create your ExtendedDocument DTD you do

<!ENTITY % local.markup "|shout">
<!ENTITY % document-dtd SYSTEM "document.dtd">
%document-dtd;

and then define what "shout" means.

This is a very simple (and dirty, I admit) way of doing DTD inheritance
without the need of complex stuff like Architectural Forms and without
the use of DTD fragmentation used in the site DTDs (which is not
extendible in this sense).

Now, let us suppose we have a stylesheet for the document DTD, creating
a stylesheet for the extended-document DTD is a matter of writing a
template for the new element and import the previous stylesheet.

This works in the document <-> spec couple, where this DTD inheritance
is used to add bibliography capabilities to the document (our documents
don't require that semantic power, only specs do).

But the other DTDs have a different sense of inheritance: they reuse
part of the code in the document DTD, but they do not set any local.xxx
entities, so they don't _expand_ the semantic meaning of them, they
simply use what's already there.

For example, the FAQ DTD reuses the %markup; entity that includes all
the markup tags with no addiction. In some sense, FAQs, Changes and Todo
provide a higher level of abstraction and, for this reason, they are not
transformed into HTML directly, but they follow two steps:

 faqs --> document --> html

while specs (which adds semantic meaning) goes

 spec -(imports document)-> html

Can you see the difference? is that clear enough? (I ask because this is
_really_ hard stuff to see, it took me months and the line it's very
thin between the two approaches)

This pattern is not a solid one, but it's the most solid I could come up
with:

 1) if you extend the semantic capabilities with local.xxx entities,
you'd better use import the original sheet and improve it from there.
 2) otherwise if you reuse parts and add higher level elements which do
not require semantic changes, tranform your more-structured document
into a plain document and then apply the regular document stylesheet.

One thing is for sure: it allows to "measure" the power of your main DTD
since sometimes, you are not able to follow number 2).

For example:

in the Changes DTD, we simply list the changes in Cocoon from revision
to revision. The DTD is simple but inherits the markup and linking
features of the Document DTD. Being an higher level structure and not
adding any semantic capabilities to what's inherited, Changes is
transformed into Document.

But I wanted to add small images up front visually indicating changes,
fixes, deletes and such. Kind of a pretty showof of stylebook
capatilities :) It turns out that the site skin places all the images on
the right side. As you can see from the web site, this is not visually
appealing at all.

So, how do we proceed? This is just an example, but shows the pattern I
used and could be general enough for other needs

- if I changed the stylesheet, all the images would go to the left and I
liked some of them on the right.
- I also wanted to center them and, maybe surround them with a nice
border and caption indicating what is that.
- so, I thought, I could do

 a) indicate an attribute to indicate the location of the image
 b) create different elements for the different needs

If I choose a), I add styling ideas directly into the content context,
thus totally breaking separation. This is what happened with HTML when
more powerful graphic capabilities were needed.

If I choose b), I turn up adding a bunch of elements for every possible
use of an image.

I believe b) is the right choice and this is what I did:

 - icon -> is a text inlined element: places the images where the
element is, normally used for small icons or text-inlined small
graphics. This is what I use with changes to add the icons on the left.

 - figure -> this is a block element: it places the image as a block,
normally centered with a caption or such.

 - img -> this is another text inlined element, but it's normally used
as a bigger figure with text around it, like in current site skin.

> > But every time I think about "forking" the DTDs, I feel bad, knowing the
> > pain that will generate.
> >
> 
> What changes are you suggesting?  Could they be done in an upward
> compatible way?

Possibly so, I'll try to come up with a summary of the required changes
ASAP (consider this is not a very high priority for me at this point)
 
> I suggest we use the "theory of parsimony" here, i.e. the DTD should be
> as simple as possible (but not simpler).  

I love this, really, very nice way to put it. I totally agree. But let's
make an example: do you think the img|figure|icon thing is becoming too
complex?

> For example, Docbook is extremely complex,
> even though its expressive power is very high.  Even though each
> additional feature provides *some* additional value, it also makes it
> harder to learn (which decreases value).

True, but what happens when you need something that it's not there?

Anyway, the differences in the semantic capabilities of our the two DTDs
are minimal, just maybe a little more structured.
 
> Let's be very careful not to add more functions, just because we can
> (the eternal temptation, especially when it's real easy to add new
> functions)...I'd like to keep the tagset to where any "Joe Writer" can
> use it, without taking a course in XML first!

Right, let's use the "icon" deal as a test. Tell me what you think and
I'll behave in consequence. (note I don't thing we need anymore of those
images, but its just an example)
 
> <snip>
> 
> > On the other hand, we need a better model for site generation
> 
> Agreed.
> 
> <snip>
> 
> > The idea is simple:
> >
> > 1) every project has its own docs files, DTD and skin (those who wish to
> > use global ones, will have the CVS symlinked for them).
> 
> Agreed.
> 
> > 2) doc writers should work on their docs and forget about anything else
> 
> Yep.  However, they also shouldn't have to learn a new tag set each
> month....

Sure, this is why I'd like to have one and only one DTD for the site
that everyone is happy to use. But note that if we extend a DTD (say,
again, we add icon to the tagset), this is completely back compatible.
We should agree on a document structure that is flexible and easy
enough.
 
> > 3) we should not move thing around: it's easy to make mistakes
> 
> Yep.
> 
> > 4) generating the site should be as easy as: login; cd
> > /home/www/xml.apache.org; ant site
> 
> I'd suggest that the site generation should be not be dependent on ANT
> (or any other particular tool), by abstracting into a script, i.e.
>         login;
>         cd /home/www/xml.apache.org;
>         build-site;  (which might internally call ANT or something else)

Agreed, no problemo at all (the java.apache.org site has a PERL script
and a shell script to generate it, while jakarta.apache.org has a shell
script that calls a simple java template tool)

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<stefano@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Come to the first official Apache Software Foundation Conference!  
------------------------- http://ApacheCon.Com ---------------------