forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <>
Subject Re: [RT] Entities in XML docs
Date Mon, 30 Dec 2002 03:07:17 GMT
Joerg Pietschmann wrote:
> On Sunday 29 December 2002 04:47, Jeff Turner wrote:
>>That was Stefano's suggestion: that we do text-only expansion for now
>>(element expansion is still possible with xinclude), and when we migrate
>>to a decent schema language we can think about removing the text-only
> Why not migrating to either a more powerful schema language
> or another validation process right now?
> AFAIR your proposal was meant as a mechanism to supplant
> XML entities, in particular in contexts where it is hard for users
> to get their entity definitions into the DTD.
> The problem you want to avoid is that a document with <xi:include>
> or <nn:replace> would not validate.
> Entities work because they are part of the DTD agains which the
> parser validates and because the parser expands them before
> examining the context for validation.
> In any other approach, the parser does not know about the
> substitutions to be made. Because the validation is, historically,
> still an integral part of the parsing step, rather than a separate
> step, this may cause problems. This is independent whether
> the substitution is done by XInclude, an XSLT replacing
> <nn:replace> elements or ${} substitution.
> This doesn't mean we can't solve the problem: Run a processor
> doing the expansion, then a validator. If performance doesn't
> matter all that much, an intermediate file can be used. Unfortunately,
> I don't know of any validator taking a SAX event stream as input
> for better performance, but I'm sure if the need arises, someone
> will take care about this.

JClark's RNG validator works as a SAX filter

Still, the issue is: do we *really* want to maintain the structure of 
our documents in both DTDs and RNGs at the same time?

But wait: JClark has a working RNG -> DTD/XMLSchema converter.

NOTE: this conversion is *intrinsically* lossy since RNG is *more* 
powerful in some areas than DTD and XMLSchema, but the tool tries to 
guess what's the best thing to do. Quite impressive internal design, to 
be honest, like all JClark's work.

> The only problem remaining are schema directed editors.

We might:

  1) have our documentation structure described as RNG
  2) use Trang to convert it to both DTDs and XMLSchemas (so that users 
can use whichever fits them)
  3) we write a Jing-validating transformer and validate at the stage of 
the pipeline that we like.

NOTE: this path is totally orthogonal to the 'token expansion' one.

>>I don't fully understand why we can't give users the option to shoot
>>themselves in the foot by including elements, but implementation-wise
>>there's little difference (two different InputModules).
> An easy implementation doesn't mean there are no problems.
> 1. Entity expansion is recursive. Is ${} expansion recursive too?

I would say no. No recursion for token expansion.

>   Like foo -> ${bar} and bar -> baz.
>   How do you avoid loops? <evil grin>


> 2. Is something like ${${foo}} allowed, supposed "foo" is substituted by
>   "bar" and "bar" by "baz"? Don't forget to explain the difference to
>   recursive expansion as in 1.

That should not be allowed. Only one pass of token expansion will be 

> 3. An XML file with a ${} substituted by a subtree with mandatory
>   elements at the place is not valid. For example
>   <!DOCTYPE foo [
>     <!ELEMENT foo (a)>
>     <!ELEMENT a #PCDATA>]>
>   <foo>${foo}</foo>
>   and foo expands to <a>bar</a>.
>   That's the point of restricting substitutions to text.

Exactly. Token expansion should be limited to text and will escape 
anythign into text (so if you had nexted elements, they will end up 
escaped like in a big CDATa section)

> 4. Elements in ${} substitution get their namespaces from the repository,
>  I think. Like if foo -> <nn:a>, the binding for the nn prefix is taken from
>  the repository XML file rather than from the document where ${foo}
>  occurs. XInclude has the same problem, but then, the XInclude spec
>  takes care of this aspect.
>  Well, namespaces and entities mix even less well.

The above fixes this as well.

> Last but not least I think giving users plenty of means to shoot themselves
> in the foot is not a very good approach, even if the users demand them.

There has been *no* demand of things that let shoot them in their foot. 
The demand is: I want to store text tokens in one place and use them all 
over so that update is easier.

The use of ${*:*} variables with copying-over fallback allows that with 
very little hassle and doesn't create future problems if:

  1) token expansion is a single pass (no recursion, loops or other 
weird things)
  2) expanded tokens are escaped (no internal structure allowed)

Also, I see no need for escaping syntax since these variable will 
probably happen inside code pieces and those normally need CDATA 
escaping anyway for < > and &.

> Read through the discussions about <xsl:script> on the XSL list for some
> arguments.
>>>XML editors
>>vim + xmllint
> External validation, can be handled easily.
>>>- Write a customized toolset.
> The processor doing the substitution, perhaps catalogue support, cross
> references, authoring support. Someone might also want to have a
> processor working outside Cocoon.
>>Just like the C preprocessor, It is an opt-in solution to a practical
> I've seen simple "solutions to practical problems" used and getting into
> deep doo-doo in the long term much to often. This kind of pragmatism
> brought us BASIC, file name suffixes denoting the content format, Tag
> Soup and the unmentionable abominations related to what's commonly
> called gHorribleKludge on XML-DEV. I still think the world would be a
> better place if such abberations had been avoided. Also, propagators
> of "pragmatic solutions" tend to walk on to the next buzz, leaving the
> mess to others to clean up. :-/

No shit.

Look at what FS can do to you:

It's scary to see that the only people that actually *get it* are those 
who are not seating in an expert group.

Sometimes it's better to just say: "screw the damn W3C" and do you own 
simple KISS stuff and have a user community keep you honest about it.

Will the W3C ever get this? well, hopefully some of them read xml-dev. :-)

Stefano Mazzocchi                               <>

View raw message