db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Satheesh Bandaram <sathe...@Sourcery.Org>
Subject Re: [PATCH] Initial XML Support
Date Wed, 25 May 2005 23:58:25 GMT
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
  <title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
Great! I hope this initial XML support in Derby will lead to other XML
enhancements as well. SQL-XML is a large specification with many other
features like publish functions, XMLConcat, mapping of SQL data into
XML, schema validation, XMLCast, XMLQuery and XMLTable etc.<br>
<br>
I have briefly looked at the proposed changes. Some initial comments:<br>
<ol>
  <li>I suspect the logic to reject XML columns at the toplevel is more
complicated than is needed. Would looking at the list of ResultColumns
at a toplevel cursor node be sufficient? A check in ReadCursorNode for
ResultColumns of XML type might be sufficient. We already have a
similar check for '?' (parameters). You can't have a parameter at a
toplevel SELECT result column. Derby allows 'select * from t where
i=?', but not 'select i, ? from t where i=?'. Seems similar to XML
restriction. Take a look at rejectParameters() method. If this logic
can be used, it should be possible to simply changes in sqlgrammar.jj,
SelectNode, ResultColumn, RowResultSetNode, ResultColumnList etc.</li>
  <li>Is it possible to consolidate some of the new nodes into existing
nodes or with each other? Like Dan mentioned recently, adding new
classes increase Derby footprint and it may be possible to consolidate
some of these new nodes.</li>
  <li>How do you query XML documents with namespace tags? Also, I think
you have to turn on namespace processing tags for Xalan to get the
correct results and I didn't notice setting those flags. This may not
be correctly functioning.</li>
  <li>You have mentioned XMLConstant is only used for null values. It
is possible, using some subquerries, the current compilation evaluates
XMLParse() into an XMLConstant before further evaluating it. In these
cases, it is possible to have a valid XML constant that is not null.<br>
  </li>
</ol>
Satheesh<br>
<br>
Army wrote:<br>
<blockquote cite="mid4294CF0A.5060406@sbcglobal.net" type="cite">Please
find attached the patch for adding initial XML support to Derby.&nbsp; While
the patch _is_ over 10k lines, note that most of that comes from two
XML files that are used in testing.
  <br>
  <br>
Comments/details of the patch are included below.&nbsp; Quoted text is
pasted from my initial description of the XML support I added, which
can be found here:
  <br>
  <br>
<a class="moz-txt-link-freetext" href="http://article.gmane.org/gmane.comp.apache.db.derby.devel/3602">http://article.gmane.org/gmane.comp.apache.db.derby.devel/3602</a>
  <br>
  <br>
----------------------
  <br>
-- Feature Description.
  <br>
----------------------
  <br>
  <br>
&gt; When creating the XML datatype, I have done so in such a way as to
make
  <br>
&gt; it possible to re-work the XML store to something smarter in the
  <br>
&gt; future--this textual representation is just an easy "first step"
to get
  <br>
&gt; things rolling.
  <br>
  <br>
I've organized the code so that there's a separation between the XML
datatype and its "type implementation", where a "type implementation"
defines how a particular XML value is read/written/processed.&nbsp; Right
now, the only type implementation I've written is a UTF8-based one that
stores/reads XML just like other Derby string types.&nbsp; More on that
below.
  <br>
  <br>
There are three primary classes that make up the full XML datatype
picture:
  <br>
  <br>
1) org.apache.derby.iapi.types.XMLDataValue
  <br>
  <br>
An interface defining the minimal methods that every XML data value
should support.&nbsp; The methods on this interface correlate to the XML
operations that I've added--namely, XMLPARSE, XMLSERIALIZE, and
XMLEXISTS.
  <br>
  <br>
2) org.apache.derby.iapi.types.XML
  <br>
  <br>
The XML datatype.&nbsp; This class implements both the XMLDataValue and the
DataType interfaces.&nbsp; For all DataType operations that are common to
every XML implementation, this XML class does the work.&nbsp; For DataType
operations that depend on the particular "XML type implementation" (see
below) being used, this XML class simply wraps another class that
handles implementation-specific operations.
  <br>
  <br>
3) org.apache.derby.impl.sql.xml.XMLImpl
  <br>
  <br>
This is the base class for what I call "XML type implementations"
(let's call it "XTI" in this email, to save me the effort of typing
it).&nbsp; An XML type implementation (XTI) determines how an XML data value
is to be written/read to/from disk, queried, and stored in memory.&nbsp; The
XMLImpl class defines the methods that every XTI (whether UTF8-based or
something smarter) must implement. &nbsp;This class is wrapped by the XML
class (#2) above and is used to handle any DataType calls that depend
directly on the XTI in use.
  <br>
  <br>
&gt; The on-disk format that I'm using is a simple textual
representation of
  <br>
&gt; XML.&nbsp; In other words, an XML document on disk is really just
stored as a
  <br>
&gt; UTF-8 character string (similar to other JDBC string types).
  <br>
  <br>
I have created a UTF8-based XTI with the class
  <br>
  <br>
org.apache.derby.impl.sql.xml.XML_UTF8Impl
  <br>
  <br>
which extends XMLImpl.&nbsp; This class takes the "easy way out" and just
wraps XML data as an instance of SQLChar.&nbsp; It reads/writes data in
UTF-8, just like other Derby string types.&nbsp; It uses the Xerces parser
to parse XML data and to check well-formedness, and it uses the XSLT
processor from Xalan to query.
  <br>
  <br>
This UTF8-based implementation is, of course, far from ideal.&nbsp; The fact
that we store XML data on disk as a string means that we have to
re-parse it every time we want to query it, which has obvious
performance issues.&nbsp; But it was an easy "first step" for XML and I hope
that future development can replace this with something smarter and
faster.
  <br>
  <br>
In order to add a new XTI, one simply needs to create a class that
extends "XMLImpl", implement all of the abstract methods, and then add
some logic in two methods defined on the XMLImpl class.&nbsp; The comments
in that file describe what those methods are what the logic should be.
  <br>
  <br>
Note that the APIs used for XML processing are included in JDBC 3.0,
and thus are inherently available from the 1.4.1 JVMs.&nbsp; In addition,
the Xerces parser that we use is loaded dynamically at run time, which
means that the codeline WILL build even if Xerces doesn't exist in the
classpath.&nbsp; That said, though, since I use the Xerces parser, anyone
who wishes to _use_ XML in Derby will have to put Xerces in his/her
classpath--this is something we may want to revisit at a later date.&nbsp;
Nonetheless, if a user does NOT want to use XML, s/he does NOT have to
have Xerces in his/her classpath--that's another benefit of loading
Xerces dynamically: a user who uses Derby for "normal", non-XML reasons
is not required to have any additional jars in his/her classpath.
  <br>
  <br>
&gt; All of the XML functionality that I've written for Derby is based
on the
  <br>
&gt; first (ISO approved) and second (still in development) editions of
the
  <br>
&gt; SQL/XML specification.
  <br>
  <br>
This is still true, and as mentioned in some earlier posts, this means
that the *** XML syntax we use is apt to change *** (esp for the
XMLEXISTS operator). Anyone using XML in Derby should be aware of this
fact.
  <br>
  <br>
&gt; A. Created an XML type that can be both transient (SQL/XML[2003]
X010)
  <br>
&gt; and persistent (SQL/XML[2003] X016).
  <br>
  <br>
Completed as described in my initial email.&nbsp; Ex:
  <br>
  <br>
ij&gt; CREATE TABLE xTable (i INT PRIMARY KEY, x XML);
  <br>
0 rows inserted/updated/deleted
  <br>
  <br>
&gt; B. Created an XMLPARSE function to parse XML (SQL/XML Feature
X061).
  <br>
  <br>
Completed as described in my initial email, with one exception.&nbsp; In my
initial email, I mentioned that it was up to Xerces to do schema
validation at parse time.&nbsp; Since then, I realized that the
SQL/XML[2003] spec explicitly states that XMLPARSE should NOT validate
a document.&nbsp; Thus, while XMLPARSE _will_ check the well-formedness of
the document and _will_ parse any associated DTDs to load defaults
and/or other DTD-related info, it will _not_ perform validation against
the DTD, nor will it validate against an XML Schema Document.
  <br>
  <br>
Syntax is as follows:
  <br>
  <br>
XMLPARSE( DOCUMENT &lt;string-value-expression&gt; PRESERVE WHITESPACE
)
  <br>
  <br>
Ex:
  <br>
  <br>
ij&gt; INSERT INTO xTable VALUES (1, XMLPARSE(DOCUMENT '&lt;simp&gt;
doc &lt;/simp&gt;' PRESERVE WHITESPACE));
  <br>
1 row inserted/updated/deleted
  <br>
  <br>
&gt; C. Created an XMLSERIALIZE function to serialize an XML value into
a
  <br>
&gt; string (SQL/XML[2003] Feature X071).
  <br>
  <br>
Completed as described in my initial email.&nbsp; The syntax is:
  <br>
  <br>
XMLSERIALIZE( &lt;xml-value-expression&gt; AS &lt;string-data-type&gt;
)
  <br>
  <br>
Ex:
  <br>
  <br>
ij&gt; SELECT i, XMLSERIALIZE(x AS CHAR(20)) FROM xTable;
  <br>
I&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|2
  <br>
--------------------------------
  <br>
1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&lt;simp&gt; doc &lt;/simp&gt;
  <br>
  <br>
1 row selected
  <br>
  <br>
&gt; D. Created an XMLEXISTS function for simple querying of XML values
  <br>
&gt; (SQL/XML[2004] Feature X096).
  <br>
  <br>
Completed as described in my initial email.&nbsp; The syntax is:
  <br>
  <br>
XMLEXISTS( &lt;xpath-expression&gt; PASSING BY VALUE
&lt;xml-value-expression&gt; )
  <br>
  <br>
Note, though, that this is based on the 2004 working draft of the spec,
and thus ** is susceptible to change ** in the future.
  <br>
  <br>
Ex:
  <br>
  <br>
ij&gt; SELECT i FROM xTable where XMLEXISTS('/simp' PASSING BY VALUE
x);
  <br>
I
  <br>
-----------
  <br>
1
  <br>
  <br>
1 row selected
  <br>
  <br>
The details of all of these changes are included in the comments for
the files. &nbsp;I think I've done a pretty thorough job of commenting, but
people should let me know if they'd like more in any particular area.
  <br>
  <br>
----------------------
  <br>
-- Known issue.
  <br>
----------------------
  <br>
  <br>
In my initial email, I mentioned that I was going to disallow binding
to/from an XML parameter.&nbsp; While I have this working for embedded mode,
I still need to figure out how to enforce this in server mode.&nbsp; Since
the setXXX methods are implemented by the client, we need to look for
XML parameters at statement preparation time and throw compile-time
errors.&nbsp; I was looking at this for a while yesterday and, oddly enough,
couldn't nail it down--but hopefully I'm just missing something small.&nbsp;
Since that's the only issue that I know of with this patch, I thought
I'd send it out and let people start reviewing it while I look at the
binding problem.&nbsp; As a result, anyone who uses the attached patch and
then tries to bind a parameter to an XML value over the server is going
to have problems.&nbsp; But since the goal is to disallow that behavior
altogether (in a graceful manner, of course), hopefully people can just
avoid doing that until I have a fix...
  <br>
  <br>
----------------------
  <br>
-- Patch details.
  <br>
----------------------
  <br>
  <br>
Since a built-in datatype tends to affect many areas, the patch
modifies a good number of files--but note that the changes to most of
those files are pretty minor.
  <br>
  <br>
The total patch is over 10,000 lines, but more than half of that is the
result of two 40k XML documents that I've added for the sake of
testing.&nbsp; And most of the rest is from new files--so no, that's not
10,000 lines of code changes ;)
  <br>
  <br>
I created two new directories.&nbsp; This means that, since the "patch"
command can't create directories on its own (at least, not the patch
command I use), you may need to create the directories manually BEFORE
applying the patch.&nbsp; The new directories are:
  <br>
  <br>
java/engine/org/apache/derby/impl/sql/xml
  <br>
java/testing/org/apache/derbyTesting/functionTests/tests/lang/xmlTestFiles
  <br>
  <br>
The first directory holds the "XML Type implementation" classes
mentioned above, along with a build.xml file that is needed so that the
XTIs are only built using JDK 1.4.&nbsp; The required XML APIs aren't in JDK
1.3 or prior, so Derby will not support XML for 1.3.
  <br>
  <br>
The second directory holds a bunch of files used for XML testing.
  <br>
  <br>
The results from an "svn stat" are attached to this email along with
the patch.
  <br>
  <br>
I ran the "derbylang" suite with Sun JDK 1.4.2 on Windows and all of
the tests passed.&nbsp; I haven't had a chance to run the full "derbyall"
suite yet, but plan to do that tonight. Yes, I realize that's very
important, and I certainly plan to do it ASAP--but I thought it'd be
good to get the patch out and have people start looking at it.&nbsp; If
there are any failures in "derbyall" when I run it locally tonight, I
will address them tomorrow.
  <br>
  <br>
Feedback is appreciated,
  <br>
Army
  <br>
  <pre wrap="">
<hr size="4" width="90%">
M      tools\jar\DBMSnodes.properties
M      java\engine\org\apache\derby\impl\sql\compile\NodeFactoryImpl.java
A      java\engine\org\apache\derby\impl\sql\compile\XMLSerializeOperatorNode.java
A      java\engine\org\apache\derby\impl\sql\compile\XMLConstantNode.java
M      java\engine\org\apache\derby\impl\sql\compile\SelectNode.java
M      java\engine\org\apache\derby\impl\sql\compile\QueryTreeNode.java
M      java\engine\org\apache\derby\impl\sql\compile\ResultColumn.java
M      java\engine\org\apache\derby\impl\sql\compile\C_NodeNames.java
A      java\engine\org\apache\derby\impl\sql\compile\XMLExistsOperatorNode.java
A      java\engine\org\apache\derby\impl\sql\compile\XMLParseOperatorNode.java
M      java\engine\org\apache\derby\impl\sql\compile\TypeCompilerFactoryImpl.java
M      java\engine\org\apache\derby\impl\sql\compile\RowResultSetNode.java
M      java\engine\org\apache\derby\impl\sql\compile\sqlgrammar.jj
M      java\engine\org\apache\derby\impl\sql\compile\DB2LengthOperatorNode.java
M      java\engine\org\apache\derby\impl\sql\compile\CharTypeCompiler.java
M      java\engine\org\apache\derby\impl\sql\compile\UnaryOperatorNode.java
M      java\engine\org\apache\derby\impl\sql\compile\ResultColumnList.java
A      java\engine\org\apache\derby\impl\sql\compile\XMLTypeCompiler.java
M      java\engine\org\apache\derby\impl\sql\build.xml
A      java\engine\org\apache\derby\impl\sql\xml
A      java\engine\org\apache\derby\impl\sql\xml\XMLImpl.java
A      java\engine\org\apache\derby\impl\sql\xml\XML_UTF8Impl.java
A      java\engine\org\apache\derby\impl\sql\xml\build.xml
M      java\engine\org\apache\derby\impl\sql\catalog\DataDictionaryImpl.java
M      java\engine\org\apache\derby\impl\jdbc\Util.java
M      java\engine\org\apache\derby\iapi\sql\compile\C_NodeTypes.java
M      java\engine\org\apache\derby\iapi\services\build.xml
M      java\engine\org\apache\derby\iapi\services\io\RegisteredFormatIds.java
M      java\engine\org\apache\derby\iapi\services\io\StoredFormatIds.java
A      java\engine\org\apache\derby\iapi\types\XML.java
M      java\engine\org\apache\derby\iapi\types\DataTypeUtilities.java
M      java\engine\org\apache\derby\iapi\types\build.xml
M      java\engine\org\apache\derby\iapi\types\SQLChar.java
M      java\engine\org\apache\derby\iapi\types\TypeId.java
M      java\engine\org\apache\derby\iapi\types\DataValueFactoryImpl.java
A      java\engine\org\apache\derby\iapi\types\XMLDataValue.java
M      java\engine\org\apache\derby\iapi\types\DTSClassInfo.java
M      java\engine\org\apache\derby\iapi\types\StringDataValue.java
M      java\engine\org\apache\derby\iapi\types\DataValueFactory.java
M      java\engine\org\apache\derby\iapi\reference\SQLState.java
M      java\engine\org\apache\derby\iapi\reference\ClassName.java
M      java\engine\org\apache\derby\catalog\types\TypesImplInstanceGetter.java
M      java\engine\org\apache\derby\catalog\types\BaseTypeIdImpl.java
M      java\engine\org\apache\derby\loc\messages_en.properties
A      java\testing\org\apache\derbyTesting\functionTests\tests\lang\xmlBinding.java
M      java\testing\org\apache\derbyTesting\functionTests\tests\lang\copyfiles.ant
A      java\testing\org\apache\derbyTesting\functionTests\tests\lang\xmlTestFiles
A      java\testing\org\apache\derbyTesting\functionTests\tests\lang\xmlTestFiles\dtdDoc.xml
A      java\testing\org\apache\derbyTesting\functionTests\tests\lang\xmlTestFiles\personal.xsd
A      java\testing\org\apache\derbyTesting\functionTests\tests\lang\xmlTestFiles\xsdDoc.xml
A      java\testing\org\apache\derbyTesting\functionTests\tests\lang\xmlTestFiles\dtdDoc_invalid.xml
A      java\testing\org\apache\derbyTesting\functionTests\tests\lang\xmlTestFiles\wide40k.xml
A      java\testing\org\apache\derbyTesting\functionTests\tests\lang\xmlTestFiles\xsdDoc_invalid.xml
A      java\testing\org\apache\derbyTesting\functionTests\tests\lang\xmlTestFiles\deep40k.xml
A      java\testing\org\apache\derbyTesting\functionTests\tests\lang\xmlTestFiles\personal.dtd
A      java\testing\org\apache\derbyTesting\functionTests\tests\lang\xmlBinding_app.properties
A      java\testing\org\apache\derbyTesting\functionTests\tests\lang\xml_general.sql
A      java\testing\org\apache\derbyTesting\functionTests\master\DerbyNet\xml_general.out
A      java\testing\org\apache\derbyTesting\functionTests\master\xml_general.out
A      java\testing\org\apache\derbyTesting\functionTests\master\DerbyNetClient\xml_general.out
A      java\testing\org\apache\derbyTesting\functionTests\master\xmlBinding.out
M      java\testing\org\apache\derbyTesting\functionTests\suites\derbylang.runall
M      java\testing\org\apache\derbyTesting\functionTests\suites\derbynetmats.runall
  </pre>
  <pre wrap="">
<hr size="4" width="90%">
Index: tools/jar/DBMSnodes.properties
===================================================================
--- tools/jar/DBMSnodes.properties	(revision 178406)
+++ tools/jar/DBMSnodes.properties	(working copy)
@@ -114,3 +114,7 @@
 derby.module.cloudscapenodes.ge=org.apache.derby.impl.sql.compile.SavepointNode
 derby.module.cloudscapenodes.gf=org.apache.derby.impl.sql.compile.IntersectOrExceptNode
 derby.module.cloudscapenodes.gg=org.apache.derby.impl.sql.compile.UnaryDateTimestampOperatorNode
+derby.module.cloudscapenodes.gh=org.apache.derby.impl.sql.compile.XMLConstantNode
+derby.module.cloudscapenodes.gi=org.apache.derby.impl.sql.compile.XMLParseOperatorNode
+derby.module.cloudscapenodes.gj=org.apache.derby.impl.sql.compile.XMLSerializeOperatorNode
+derby.module.cloudscapenodes.gk=org.apache.derby.impl.sql.compile.XMLExistsOperatorNode
Index: java/engine/org/apache/derby/impl/sql/compile/NodeFactoryImpl.java
  </pre>
</blockquote>
</body>
</html>


Mime
View raw message