axis-java-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjiva Weerawarana" <>
Subject Re: [Axis2] [Update] XMLConformace Testing Report.
Date Mon, 25 Apr 2005 15:25:06 GMT
Hi Jaya,

Wow, thanks for all the hard work on this!

Do I read your report correctly as this test didn't find any bugs
in the OM level but rather encountered difficulties in the parser
level?? If so I'm very happy :-).

Of the passing ones, what made 735-567 documents not compare
successfully? Can we fix that?



----- Original Message ----- 
From: "jayachandra" <>
To: <>
Sent: Monday, April 25, 2005 7:26 PM
Subject: [Axis2] [Update] XMLConformace Testing Report.

> Hi all,
> Total file count in W3C XMLSuite :2634 (this includes, valid, invalidand
illformed xmls too) Of them, valid ones                    :960 (i.e.
excluding invalidand illformed xmls. However this includes XMLs of both
versions 1.0and 1.1)
> Of them, valid XML1.0 ones         :832 (i.e excluding xmls from
1.1version folders. Since the MXParser we have beneath is only 1.0compliant)
> On this final set, when OM is tested as is. 335 files got parsedproperly,
and 309 files had the serialized XML matching the input file(comparison
test). I've implemented OMComment and OMPI and did minimalistic
OMDTD(without validation etc.) support. And with those changes the
parsingrate increased to 735 and comparison success reached 567.
> The parsing failures found can be attributed to one or more of
thefollowing observations I could make. This is not an exhaustive
> 1. For files where XML declaration line has a mention of
'standalone'attribute prior to 'encoding' attribute, underlying MXParser
threw anexception with a message reading something like "Expected 'e'
inencoding and not 's' ". Alek! Is this a known issue with STAX. What doyou
> 2. For files in which DTD declaration has right square bracket (']')as a
literal value of some entity, MXParser is treating it as end ofDTD
> 3. Some xmls having multi byte characters (UK currency pound signamongst
others) are failing to get parsed with typical exceptionmessages like only
whitespace content allowed before start tag and not\ufffd. I have passed a
"UTF-8" aware reader to the builder, do I needto use something else here?
> 4. Apart from these because I couldn't implement the complete DTD infoset
implementation, some more files are failing to get parsed.
> Regarding the comparison, some of the observed reasons of failures are…
> 1. Many SYSTEM identifiers in DTD declarations used a relativereference
and so far we don't have considered 'baseURI' property (doesSTAX parser
provide one?) for any of the elements and hence the XMLcomparator (xmlunit)
couldn't resolve the system identifiers therebyleading to a mismatch between
the serialized xml and the originalinput form.2. Also since the DTD support
is naïve, the presentation of data iscompletely ignored thereby leading to
scenarios like, serializing as#PCDATA when DTD says CDATA. This also lead to
significant comparisonfailures.
> ThanksJaya
> ---- Jaya

View raw message