xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott_B...@lotus.com
Subject Re: Potential Xerces regression (bug# 933)
Date Mon, 12 Mar 2001 06:36:31 GMT

OK.  First of all, I believe the problem with processing XercesJ 1.3.0 and
greater DOMs (which I guess is what xml-stylebook uses, though this makes
my skin itch... it ought to be using stream processing... I hope no one is
complaining about performance) is easily flagged with "build smoketest".
We really need to get "build smoketest" working properly with gump.  I'll
try and put some focus on this next week.

There are at least two major problems.  The first of which I've put a hack
in Xalan to work around, but I'm not sure if there is a work around for the
other problem, which happens *after* Xerces 1.3.0.  Neither of these are
reported in Bugzilla that I can see, so I'll try to do this tomorrow if I
get a chance.

Second of all, sorry for this long note.  I just want to make sure I have
all the information down in one place.

BTW, the testing I am doing in regard to this is all on the main branch,
with a latest source checkout tonight.

========
The first problem is that I believe Xerces at some point decided to use ""
instead of null for null namespaces.  There has been a discussion between
Gary Peskin in Joe Kesselman on xalan-dev about this, but I hadn't been
fully keeping up the the thread that well, and missed it's relation to this
problem.  I include some of the discussion at the end of this note.  I made
Xalan be able to compare a "" namespace to null for now, until we get this
resolved.

The gist of the discussion is:

>>1.  Declare the Xerces-J support of schemas to have a bug and ask that
>>Xerces be corrected to always use a null namespace URI to indicate that
>>there is no default namespace.  Even if the Xerces people change this
>>behavior, is this correct?
>
> Yes. If your description of the problem is accurate (you should probably
> submit a more detailed case so it can be reproduced in the lab), this is
a
> parser/DOM-builder bug.

========
The other problem is "DOM006 Hierarchy request error" when outputting to a
DOM.  For some very strange reason someone decided that
DocumentBuilder#newDocument() should add an element named "root" to the
Document it creates.  Then, when Xalan goes to add the first element out of
the transform to the Document element, you predictable get "DOM006
Hierarchy request error". In a unit test I do:

        DocumentBuilder docBuilder = dfactory.newDocumentBuilder();
        Node xmlDoc = docBuilder.parse(new InputSource("foo.xml"));
        org.w3c.dom.Document outNode = docBuilder.newDocument();
        transformer.transform(new DOMSource(xmlDoc, "foo.xml"),
                              new DOMResult(outNode));

In Xerces 1.2.3 and Xerces 1.3.0, DocumentBuilderImpl#newDocument()
[version 1.2] was (properly, I think) implemented as:

    public Document newDocument() {
        return(new org.apache.xerces.dom.DocumentImpl());
    }

In DocumentBuilderImpl#newDocument() [version 1.3] and on this is
implemented as:

DocumentBuilderImpl#newDocument() is implemented as:
    public Document newDocument() {
        DOMImplementation di = getDOMImplementation();
        // XXX What should the root element be named???
        String qName = "root";
        DocumentType docType = di.createDocumentType(qName, null, null);
        return di.createDocument(null, qName, docType);
    }

Weard.  Version 1.3 of DocumentBuilder has the CVS log:

----------------------------
Revision : 1.3
Date : 2001/2/3 0:28:59
Author : 'edwingo'
State : 'Exp'
Lines : +106 -78
Description :
Merged in from Xerces 2: implementation of parsing
component(javax.xml.parsers) of JAXP 1.1
----------------------------

So the same problem may exist in Xerces2, and I don't want to trace the
culprit before the merge.  Anyway, someone is poorly mistaken if they think
that newDocument() should create a magic root node.  I hope this can be
fixed as soon as possible.

I include the discussion about "" as a null namespace from xalan-dev after
my signature.

-scott


----- Forwarded by Scott Boag/CAM/Lotus on 03/12/2001 01:23 AM -----
                                                                                         
                         
                    Gary L Peskin                                                        
                         
                    <garyp@firste        To:     xalan-dev@xml.apache.org             
                            
                    ch.com>              cc:     Xerces-J Development <xerces-j-dev@xml.apache.org>,
(bcc: Scott   
                                         Boag/CAM/Lotus)                                 
                         
                    03/02/2001           Subject:     Re: [Fwd: Xalan2 with Xerces1.3]   
                         
                    12:49 AM                                                             
                         
                    Please                                                               
                         
                    respond to                                                           
                         
                    xalan-dev                                                            
                         
                                                                                         
                         
                                                                                         
                         




Joseph_Kesselman@lotus.com wrote:
>
> Speaking as DOM WG alternate representative:

Thanks, Joe.  Your answers cleared things up for me.  I'll work up a
small test case to demonstrate the problem and submit it to the Xerces-J
list and bugzilla.

Gary


----- Forwarded by Scott Boag/CAM/Lotus on 03/12/2001 01:19 AM -----
                                                                                         
                            
                    Joseph_Kesselman                                                     
                            
                    @lotus.com              To:     xalan-dev@xml.apache.org             
                            
                                            cc:     Xalan Development <xalan-dev@xml.apache.org>,
Xerces-J            
                    03/01/2001 10:31        Development <xerces-j-dev@xml.apache.org>,
(bcc: Scott Boag/CAM/Lotus)    
                    PM                      Subject:     Re: [Fwd: Xalan2 with Xerces1.3]
                            
                    Please respond                                                       
                            
                    to xalan-dev                                                         
                            
                                                                                         
                            
                                                                                         
                            





Speaking as DOM WG alternate representative:

>the null namespace and the "" namespace.  The DOM Level 2 Core document
>states that these are two different namespaces

Yep. That wasn't an easy decision, but it really did seem to be the best
available answer since we didn't want to either force folks to test for
both or "automagically" convert one into the other. As you noted, either
would add overhead in order to suppress something which shouldn't be
allowed to arise in the first place.


>The XML Namespace recommendation indicates the the "" namespace URI is
>the same as the default namespace

Actually, it doesn't. The DOM WG checked that very carefully before we made
the above decision. The namespace spec is trying to say that an XML
namespace declaration with the empty-string value is special-cased as a
request to "undefine" the prefix and return to the default namespace, in
lieu of inventing another syntax or magic name for that case. It was _NOT_
intended to assert that the default namespace's name was the empty string.


>1.  Declare the Xerces-J support of schemas to have a bug and ask that
>Xerces be corrected to always use a null namespace URI to indicate that
>there is no default namespace.  Even if the Xerces people change this
>behavior, is this correct?

Yes. If your description of the problem is accurate (you should probably
submit a more detailed case so it can be reproduced in the lab), this is a
parser/DOM-builder bug.


>Will we have problems with other XML parsers?

Not unless they have similar bugs.

----- Forwarded by Scott Boag/CAM/Lotus on 03/12/2001 01:20 AM -----
                                                                                         
                         
                    Gary L Peskin                                                        
                         
                    <garyp@firste        To:     Xalan Development <xalan-dev@xml.apache.org>,
Xerces-J            
                    ch.com>              Development <xerces-j-dev@xml.apache.org>
                                
                                         cc:     (bcc: Scott Boag/CAM/Lotus)             
                         
                    03/01/2001           Subject:     [Fwd: Xalan2 with Xerces1.3]       
                         
                    02:12 PM                                                             
                         
                    Please                                                               
                         
                    respond to                                                           
                         
                    xalan-dev                                                            
                         
                                                                                         
                         
                                                                                         
                         




Joe, Scott, Xerces people --

HELP!!  Somsak has submitted the problem below.  I have investigated and
found out the cause.  The short answer, I think, is a confusion between
the null namespace and the "" namespace.  The DOM Level 2 Core document
states that these are two different namespaces:

"Note that because the DOM does no lexical checking, the empty string
will be treated as a real namespace URI in DOM Level 2 methods.
Applications must use the value null as the namespaceURI parameter for
methods if they wish to have no namespace."

The source of Somsak's immediate problem is that, when a schema is
defined on the input XML document, Xerces creates a node with a ""
namespace URI.  When no schema is defined on the input XML document,
Xerces creates a document with a null namespace URI.

The XML Namespace recommendation indicates the the "" namespace URI is
the same as the default namespace
(http://www.w3.org/TR/1999/REC-xml-names-19990114/#defaulting):

"The default namespace can be set to the empty string. This has the same
effect, within the scope of the declaration, of there being no default
namespace. "

Xerces also uses a null namespace URI to indicate that there is no
default namespace.

As we parse and compile the XPath match pattern in

  <xsl:template match="Class">

we encode this a NodeTest with a null namespaceURI.  Several sections of
Xalan code test for and recognize a null namespaceURI as being the
default namespace.

Now, when we go to match the <Class> node in the input XML, our match
fails when schemas are used because we're matching our null namespaceURI
with the input DOM's namespaceURI of "".  This ends up invoking the
built-in template for <Class> which does an apply-template of the
children which adds a bunch of text strings into the result tree.  These
unparented text strings are what cause the DOM006 error.

If, on the other hand, schemas are not used, Xerces reports a null
namespaceURI in the input XML and our match works fine.

So, on the Xalan team, I guess we have a few options:
1.  Declare the Xerces-J support of schemas to have a bug and ask that
Xerces be corrected to always use a null namespace URI to indicate that
there is no default namespace.  Even if the Xerces people change this
behavior, is this correct?  Will we have problems with other XML
parsers?
2.  The reverse of the above and ask Xerces-J to always use an empty
string to indicate that there is no default namespace.  Same issues as
1.  Also will cause lot's of code changes in Xalan, I suspect.
3.  Have a new compare method for namespaces based on
NodeTest.subPartMatch which tests for either namespace being the empty
string and allowing that to compare equal to null.  This is probably the
most flexible but we will take a performance hit while attempting to
match template match patterns.  Since this is such a heavily used
section of the code, I'm hesitant to add additional path length but
there may be no way around it.

Thoughts?

Gary


---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Mime
View raw message