cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rick Jelliffe" <>
Subject Abstract Schemas APIs
Date Tue, 02 Apr 2002 02:03:36 GMT
From: "Ivelin Ivanov" <>
> Have you been following the discussion with Kohsuke on a possible JARV
> integration?
> If you had a chance to see the JARV API, my source code and probably
> Torsten's API, maybe you can elaborate on a possible higher level validation
> API which will encompass multiple schemas.
I posted some thoughts to XML-DEV at
which I will be sending to the DOM WG.

In general, Locators or SAXExceptions should be extended to
    1) carry paths as well as file/line/column messages
    2) carry HTML (or XML) text for richer messages in addition to plain text messages
    3) carry some kind of user-defined status constant apart from the basic Warning etc/ types
    4) carry enough information to allow repair of the document
    5) carry enough information to interrogate the current parsing context(s) of the document
    at that failure point
    6) tell which parsing/validation system generated the failure

Some of these are pretty easy to do, but some like 5) would probably need 
a ground-up redesign, since I don't expect validators are designed to allow
snapshots of their states!

We have been using Xerces-J in our editor for external validation, and
the problems we have with it have been
    1) locator error messages for XML Schemas occur too far from the
    actual incident--for example, if a required element is missing the
    locator is for the end-tag of the parent, it seems.
    2) Xerces does not let the user turn on and off different kinds of
    validation and WF checking depending on the users interest, enough.
    When checking fragments, it is useless to get error messages relating
    to IDREF and keyref, for example.
    3) This even extends to WF checking. As a parser feature, it would
    be good to allow unrooted documents, or to allow truncated documents
    which miss out on some end-tags at the end of the document, or
    try to match start- and end-tags in a case-insensitive way: this
    would allow much flexible validation.
    4) The regular expression bug has a known fix, but this has never
    been incorporated AFAIK. I don't see how any XML Schemas datatypes
    can be reliable without it. 
    5) When sending a non-WF document with multiple roots and the
    continue-after-error feature enabled, we get an out-of-memory exception,
    which is out of proportion to the problem that causes it.

In general, there is a design question of whether the technology should impose
a validation checklist on the user, where they have to attend to
earlier problems first, or whether the technology allows the
user to focus on particular regions of a document or areas of
interest fist: for example, a user might want to get linking correct
before the the metadata but the DTD requires the metadata 
for validity.  For documents-in-progress, users should be allowed
to work to their own agenda and order as much as possible;
this has been a long-running problem with SGML and XML

For contractual exchange of finished documents, the idea of "validity"
is useful. But for documents-in-progress, it can be counter productive.
Instead, a more useful idea is "feasibility".  For Xerces to be really useful
in document production, it will need more options or features aimed
at this kind of lesser validation.  I am presenting a paper on this at
XML 2002 in Barcelona next month, by the way, if anyone is interested:
"When well-formedness is too much and validity is not enough"

I hope this is some use,
Rick Jelliffe    

To unsubscribe, e-mail:
For additional commands, email:

View raw message