camel-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siano, Stephan" <stephan.si...@sap.com>
Subject Question about documentType in XPathBuilder
Date Fri, 16 Jan 2015 12:52:18 GMT
Hi,

If you look into the XPathBuilder in camel (actually the doInEvaluateAs() method), you see
that the data that the evaluated with the XPath expression (a header or the body) is first
converted into a data type defined in the documentType attribute of the XPath builder. Afterwards
the expression is evaluated with the Object (or the node attribute of it if it is a DOMSource).

The default for the documentType is Document (DOM), which is pretty much memory consuming.
On large XML documents (e.g. 100 MB) parsing a DOM may lead to an OutOfMemoryError. If the
Saxon parser is used for transformation, the implementation is capable of using a TinyTree
instead of e Xerces DOM, which is much smaller, however that doesn't help if the JVM goes
OOM when parsing the Document with the Xerces parser into a DOM tree even before the transformation
takes place.

In Java DSL it is possible to set the documentType to an XPath expression (as in)
        from("direct:setbody")
            .setBody(xpath("/a/b/c", Document.class)
                    .documentType(SAXSource.class)
                    .factory(new XPathFactoryImpl())
                    );

The route is capable of transforming much larger Documents than the same route without the
.documentType(SAXSource.class) statement (InputSource will also work if the incoming data
has a type converter to InputSource).

In XML DSL there is unfortunately no way to set the document type.

I have some questions about that:

1.       Does anybody know why Document was taken as a default documentType?

2.       Why is the documentType not configurable in XML DSL? What would I need to do in order
to add an extra attribute to the XML DSL?

3.       Wouldn't a more dynamic approach be better? E.g. if the  data is a DOM tree from
the beginning us that, if it's a SAXSource use that one and if it's something like an InputStream
or String use an InputSource?

What do you think about this?

Best regards
Stephan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message