xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Kelly" <ke...@mail2a.jpl.nasa.gov>
Subject Re: How to get the value of the node
Date Fri, 12 May 2000 14:49:00 GMT
The reason is this: the DOM tree has a few more nodes in it than you're
expecting (I'm leaving out ignorable text nodes):

Document (root)
...ProcessingInstruction "<?xml version="1.0"?>"
...Element "BOUGHTSTUFF"
......Element "STUFF"
.........Element "TYPE"
............Text "milk"
.........Element "EXPIRE"
............Text "6 may 2000"

So, when you get the EXPIRE element's value, you get null.

But if you get the EXPIRE element's first child's value, you get "6 may
2000".

But be warned: there can be multiple Text child nodes under an Element node,
particularly if there are entities in the text.  For example, this XML
document:

<?xml version="1.0"?>
<alpha>
  <beta>I like both vanilla &amp; chocolate.</beta>
</alpha>

would produce this DOM tree:

Document (root)
...ProcessingInstruction "<?xml version="1.0"?>"
...Element "alpha"
......Element "beta"
.........Text "I like both vanilla "
.........Text "&"
.........Text " chocolate."

Again, I'm leaving out ignorable whitespace nodes for clarity.

You can get around these multiple child Text nodes by either using a routine
like the following:

        /** Get the text out of the given node.
         *
         * Algorithm taken from <cite>XML and Java</cite> by Maruyama,
Tamura, and
         * Uramoto, Addison-Wesley 1999.
         *
         * @param node The node whose children contain text.
         * @return The text.
         */
        private static String text(Node node) {
                // [ return text(node) ]
                StringBuffer buffer = new StringBuffer();
                return text1(node, buffer);
        }

        /** Get the text out of a given node and into the given buffer.
         *
         * Algorithm taken from <cite>XML and Java</cite> by Maruyama,
Tamura, and
         * Uramoto, Addison-Wesley 1999.
         *
         * @param node The node.
         * @param buffer The buffer.
         * @return The text.
         */
        private static String text1(Node node, StringBuffer buffer) {
                for (Node ch = node.getFirstChild(); ch != null; ch =
ch.getNextSibling()) {
                        if (ch.getNodeType() == Node.ELEMENT_NODE ||
ch.getNodeType() == Node.ENTITY_REFERENCE_NODE)
                                buffer.append(text(ch));
                        else if (ch.getNodeType() == Node.TEXT_NODE)
                                buffer.append(ch.getNodeValue());
                }
                return buffer.toString();
        }

Another thing you can do is normalize the document tree by calling the
Document.normalize() method.  That will also get rid of ignorable white
space nodes, too, if your document is using a DTD that would indicate to the
parser which white space nodes it can get rid of.

It'd help a lot if you wrote a little recursive debugging utility to print
out your DOM tree.  It doesn't have to be fancy or anything, but it would
help you understand exactly what your tree looks like.

--Sean





Mime
View raw message