forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Turner <je...@apache.org>
Subject Re: [RT] Entities in XML docs
Date Mon, 30 Dec 2002 12:06:39 GMT
On Sun, Dec 29, 2002 at 08:38:04PM +0100, Joerg Pietschmann wrote:
> On Sunday 29 December 2002 04:47, Jeff Turner wrote:
> > That was Stefano's suggestion: that we do text-only expansion for now
> > (element expansion is still possible with xinclude), and when we migrate
> > to a decent schema language we can think about removing the text-only
> > restriction.
> 
> Why not migrating to either a more powerful schema language
> or another validation process right now?

Can do, if it solves more problems than it creates.

> AFAIR your proposal was meant as a mechanism to supplant
> XML entities, in particular in contexts where it is hard for users
> to get their entity definitions into the DTD.
> The problem you want to avoid is that a document with <xi:include>
> or <nn:replace> would not validate.
> 
> Entities work because they are part of the DTD agains which the
> parser validates and because the parser expands them before
> examining the context for validation.
> In any other approach, the parser does not know about the
> substitutions to be made. Because the validation is, historically,
> still an integral part of the parsing step, rather than a separate
> step, this may cause problems. This is independent whether
> the substitution is done by XInclude, an XSLT replacing
> <nn:replace> elements or ${} substitution.
> This doesn't mean we can't solve the problem: Run a processor
> doing the expansion, then a validator. If performance doesn't
> matter all that much, an intermediate file can be used. Unfortunately,
> I don't know of any validator taking a SAX event stream as input
> for better performance, but I'm sure if the need arises, someone
> will take care about this. The only problem remaining are schema
> directed editors.

Yes, I think that is generally understood to be the long-term goal.

> > I don't fully understand why we can't give users the option to shoot
> > themselves in the foot by including elements, but implementation-wise
> > there's little difference (two different InputModules).
> An easy implementation doesn't mean there are no problems.
> 1. Entity expansion is recursive. Is ${} expansion recursive too?
>   Like foo -> ${bar} and bar -> baz.

Don't see why not.  It's quite useful:

projName = Forrest
proj = ${projName} v0.3.1

>   How do you avoid loops? <evil grin>

Two options:
 1) Loop detection algorithm, like XInclude.  Can't be that hard can it?
 2) Same way entity expansion avoid loops: a variable value cannot
 contain an undeclared variable reference.

> 2. Is something like ${${foo}} allowed, supposed "foo" is substituted by
>   "bar" and "bar" by "baz"? Don't forget to explain the difference to
>   recursive expansion as in 1.

Same as if we had:

<!ENTITY foo "bar">
<!ENTITY bar "baz">

And &&foo;;.  Ie, it's not a valid variable, so print a warning and
ignore it.

> 3. An XML file with a ${} substituted by a subtree with mandatory
>   elements at the place is not valid. For example
>   <!DOCTYPE foo [
>     <!ELEMENT foo (a)>
>     <!ELEMENT a #PCDATA>]>
>   <foo>${foo}</foo>
>   and foo expands to <a>bar</a>.

That looks valid to me.  I assume you meant foo to expand to <b>bar</b>
or something.

>   That's the point of restricting substitutions to text.

Say we have a layered set of operations:

1) parse XML+ns
2) variable substitution
3) validation

Then why should anything in step 2 care about step 3?  Why should the
variable substituter have to worry about if the result is valid?
XInclude doesn't worry about this.  It deals with infosets, not PSVIs.

> 4. Elements in ${} substitution get their namespaces from the repository,
>  I think. Like if foo -> <nn:a>, the binding for the nn prefix is taken from
>  the repository XML file rather than from the document where ${foo}
>  occurs. XInclude has the same problem, but then, the XInclude spec
>  takes care of this aspect.
>  Well, namespaces and entities mix even less well.

Well I never.  First they tell me Santa isn't real, and now you say
namespaces will cause problems.  Keeping namespace consistency *sounds*
relatively easy, but perhaps it's time for me to read the XInclude spec
properly :)

> Last but not least I think giving users plenty of means to shoot themselves
> in the foot is not a very good approach, even if the users demand them.

See above about layering.  Kapow.. no more foot, but at least we still
have SoC :) :) I love bad puns.

> Read through the discussions about <xsl:script> on the XSL list for some
> arguments.

I remember lots of (non-Java) implementors complaining (quite rightly)
about having to implement Javascript..

> > > XML editors
> > vim + xmllint
> External validation, can be handled easily.
>
> > > - Write a customized toolset.
> > ?
> The processor doing the substitution, perhaps catalogue support, cross
> references, authoring support. Someone might also want to have a
> processor working outside Cocoon.

Yes, inventing some new ${variable} syntax ties the XML to Forrest, which
isn't nice.

So what do we do?

Options I see:

 1) Abandon DTDs and move to a properly layered system, where we can use
    <xi:include> elements (or any other mechanism), and have them replaced
    *before* validation.

 2) Stick with DTDs and the implication that validation occurs before
    variable substitution.

  2.1) Use XInclude, and simply hack any DTDs we need to support the
       xi:include element.  We could provide specially modified driver
       DTDs for things like Docbook, so users don't need to figure out
       which %peref; to modify themselves.

  2.2) Use ${variables} for including XML snippets, and too bad about
       validation.

  2.3) Use ${variables}, but as text-only replacers, so the validity of
       the XML is preserved.  Easy to implement.  Not very useful, as
       I'd imagine many inclusions would be small XML snippets, like
       paragraphs and <link>s.

 3) Tie Forrest to Xerces, and use an XNI XInclude processor:

   "XInclude Processor
     An XNI parser component can be written to handle XInclude by
     analyzing the streaming information set and automatically inserting
     the contents of referenced links into the event stream. By adding
     this component to the parser pipeline before the validator, included
     content would appear transparent to the validator as if that content
     was in the original document. "
     - http://xml.apache.org/xerces2-j/xni.html

My preferences are 1), 3) and 2.2).


> > Just like the C preprocessor, It is an opt-in solution to a practical
> > problem.
> I've seen simple "solutions to practical problems" used and getting into
> deep doo-doo in the long term much to often. This kind of pragmatism
> brought us BASIC, file name suffixes denoting the content format, Tag
> Soup and the unmentionable abominations related to what's commonly
> called gHorribleKludge on XML-DEV. I still think the world would be a
> better place if such abberations had been avoided. Also, propagators
> of "pragmatic solutions" tend to walk on to the next buzz, leaving the
> mess to others to clean up. :-/

Where would the XML industry be if it weren't for kludges to support,
document, hype, anti-hype, complain about on XML-DEV, code around, write
thick books explaining..

> J.Pietschmann

Thought-provoking email :)  Thanks, it probably saved the project a
time-consuming detour.


--Jeff

Mime
View raw message