db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Pendleton <bpendle...@amberpoint.com>
Subject DERBY-688: Some review comments and feedback
Date Sun, 06 Aug 2006 22:03:52 GMT
Hi Army, thanks for posting the patches and for continuing the work
on the XML support. I think this is going to be a great feature!

Here's my feedback; hope it's useful.



1) The patches read well. The comments are fantastic! The effort is greatly

2) The patches all applied cleanly for me, once I locally edited the
    absolute file names in the patches. After each individual patch in the
    sequence, I had no problems re-building derby. So really no build problems
    to mention.

3) Who will run these tests, and when? If all the execution code is optional,
    how do we ensure that it doesn't get broken?

4) Can you further explain the BY VALUE vs. BY REF behaviors? What do these
    clauses mean, why is BY REF better, at what point would we want to
    re-introduce BY VALUE, how does this manifest itself in the code?

5) If/when you re-generate the patches, please use relative path names for
    the files in the patches so that we don't get strings like
    c:/private/derby_src/java in the file names.

6) This is kind of a user-level question, and shows my ignorance about how
    XML support is supposed to fit into Derby: most of your examples and tests
    show the use of extremely tiny XML documents; they can fit into literal
    strings and are at most a few hundred bytes long. But in practice, XML
    documents are often ridiculous gigantic things which are hundreds of
    thousands of bytes long, and people try not to manipulate them in memory,
    but rather read them from files and write them to files, streaming them
    through parsers and into in-memory DOM trees only as needed.

    How does this work in Derby? Some questions that occur:
    a) If I have a large XML document in a file, how do I get that into my
       XML column in my database? Is it like a CLOB/BLOB where I work with
       some sort of a special stream class?
    b) The mirror-image question is how do I fetch a large XML document from
       my table and stream it to my file on my client efficiently?
    c) Internally, does the store use CLOB/BLOB techniques for XML storage?
       does it store them in separate files?
    d) how does DRDA tranmit XML over the net? Is it externalized data?

    Obviously, these questions are motivated by some of the work that
    Tomohito Nakayama and others have been doing recently with BLOB/CLOB
    efficiency, for example DERBY-326 and DERBY-550.

7) Another user-level question: in your test programs, your XML documents
    tend to be quite simple. They don't have the sorts of things that
    real-life XML documents have, like:
    a) <?xml ... ?> headers, with varying encodings and the like
    b) multiple namespaces with various namespace prefixes
    c) strange sections of escaped CDATA
    d) DTD declarations with external DTDs
    e) named external entities

    Presumably, since all of this is handled by the parser, "it just works".
    However, I'm a little confused about how the parsing happens in a
    client-server scenario: is the XMLPARSE processing performed on the
    client side? Or on the server side? I think this only becomes relevant
    when the user must do something to ensure that the XML parser and the
    XPATH/XQUERY engines are configured properly; they need to know which
    "side" (client/server) of their environment needs to be so configured.

8) We need to make sure that the documentation clearly specifies which versions
    of the add-on XML software (parsers, XPATH, etc.) are specified, and we
    need to do our best to make the error messages when a bad version is used
    clear and specific. For example, XALAN 2.4 is bundled with the Sun 1.4
    JDK but it is probably far too old to be used successfully. Yet how to
    install a newer version as an endorsed standard, and how to recognize the
    error messages when the wrong version is being used, is pretty subtle
    right now.

9) When I run lang/xmlBinding.java, I see the following diff. This diff occurs
    in all three configurations I tried (embedded, DerbyNet, and DerbyNetClient)

-bash-2.05b$ java org.apache.derbyTesting.functionTests.harness.RunTest lang/xmlBinding.java
*** Start: xmlBinding jdk1.4.2_11 2006-08-05 17:28:52 ***
9 del
< Inserted roughly 40k of data.
10 del
< Inserted roughly 40k of data.
 > Inserted roughly 39k of data.
 > Inserted roughly 37k of data.
21 del
< 1, [ roughly 40k ]
22 del
< 2, [ roughly 40k ]
 > 1, [ roughly 39k ]
 > 2, [ roughly 37k ]
Test Failed.
*** End:   xmlBinding jdk1.4.2_11 2006-08-05 17:28:59 ***

View raw message