axis-java-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jayachandra <>
Subject [Axis2] [Update] XMLConformace Testing Report.
Date Mon, 25 Apr 2005 13:26:24 GMT
Hi all,

Total file count in W3C XMLSuite :2634 (this includes, valid, invalid
and illformed xmls too)
Of them, valid ones                    :960 (i.e. excluding invalid
and illformed xmls. However this includes XMLs of both versions 1.0
and 1.1)

Of them, valid XML1.0 ones         :832 (i.e excluding xmls from 1.1
version folders. Since the MXParser we have beneath is only 1.0

On this final set, when OM is tested as is. 335 files got parsed
properly, and 309 files had the serialized XML matching the input file
(comparison test).
I've implemented OMComment and OMPI and did minimalistic OMDTD
(without validation etc.) support. And with those changes the parsing
rate increased to 735 and comparison success reached 567.

The parsing failures found can be attributed to one or more of the
following observations I could make. This is not an exhaustive list

1. For files where XML declaration line has a mention of 'standalone'
attribute prior to 'encoding' attribute, underlying MXParser threw an
exception with a message reading something like "Expected 'e' in
encoding and not 's' ". Alek! Is this a known issue with STAX. What do
you think?

2. For files in which DTD declaration has right square bracket (']')
as a literal value of some entity, MXParser is treating it as end of
DTD declaration.

3. Some xmls having multi byte characters (UK currency pound sign
amongst others) are failing to get parsed with typical exception
messages like only whitespace content allowed before start tag and not
\ufffd. I have passed a "UTF-8" aware reader to the builder, do I need
to use something else here?

4. Apart from these because I couldn't implement the complete DTD info
set implementation, some more files are failing to get parsed.

Regarding the comparison, some of the observed reasons of failures are…

1. Many SYSTEM identifiers in DTD declarations used a relative
reference and so far we don't have considered 'baseURI' property (does
STAX parser provide one?) for any of the elements and hence the XML
comparator (xmlunit) couldn't resolve the system identifiers thereby
leading to a mismatch between the serialized xml and the original
input form.
2. Also since the DTD support is naïve, the presentation of data is
completely ignored thereby leading to scenarios like, serializing as
#PCDATA when DTD says CDATA. This also lead to significant comparison


-- Jaya
View raw message