axis-java-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "MURRAY,BRYAN (HP-FtCollins,ex1)" <>
Subject RE: [AXIS ARCH] - Message Internals
Date Thu, 01 Feb 2001 17:51:06 GMT

I was refering to a class of parsers called a pull parser rather than a
specific instance. A pull parser leaves you in control of asking for the
next token rather than telling you when the next token is available.

The class of parsers which are DOM parsers (where the Xerces DOM parser is
an instance) generally do all of their parsing when they are first given the
source data. You then have routines available which allow you to traverse
the parse tree which is entirely in memory.

The class of parsers which are SAX parsers (where the Xerces SAX parser is
an instance) require that you provide a set of event routines which the
parser will call as it reads the source data. In the event routines you can
either process the tokens as they are received, or build your own parse tree
with the information which is important to your application. Generally, it
is possible to use much less memory than is required for a DOM tree because
you can tailor the objects to the needs of your application.

The pull parser which was recently posted to this list is not mine, but I
have done experimenting with pull parsers and like how they work.

As for wanting to make use of existing technologies I generally agree.
However, if I want to use a pull parser I cannot use a Xerces parser because
they have not written one. Perhaps the pull parser recently posted to this
list could be used. Also, the Xerces parsers are very slow. I have done some
measurements with using the Xerces DOM and SAX parsers with a SOAP message
having no headers and representing an RPC call containing a string and an
int. After parsing the document 10000 times the DOM parser could parse that
document 5.8 times per second and the SAX parser could parse it 11.2 times
per second. I believe that it should be possible to parse the same XML
document at more than 500 times per second.

I have no intention of writing another digital signature tool. I was only
trying to describe a method by which digital signatures could be verified
using a streaming technique in the SOAP processor.

Bryan Murray

-----Original Message-----
From: Yuhichi Nakamura []
Sent: Wednesday, January 31, 2001 6:02 PM
Subject: RE: [AXIS ARCH] - Message Internals

Hi Bryan,
I just want to make sure what the pull parser means.  Do you indicate a
functional parser or just a concept?  I thought that DOM is a pull parser,
and SAX
is a push parser in this context.  Maybe I am wrong.  Please correct me.

For Digital Signature, there exists a DOM-based tool (very stable) on IBM
alphaWorks (actually, it comes from our team).  Do you really want to
yet another dig-sig tool in this project?  I think that we need to adopt
"stable" modules as much as possible.

Your items for perfomance are very adequate.  I would ask: Do we have such
or do we develop such parser in this project.

IMHO, we should not assume things that does not exist.  Axis engine should
developed on top of existing techonoloies, therefore we should not reinvent
similar things in this project.  At this moment, I feel that Xerces
is the most appropriate for the parser stuff.


Yuhichi Nakamura
IBM Research, Tokyo Research Laboratory
Tel: +81-46-215-4668
FAX: +81-46-215-7413

From: "MURRAY,BRYAN (HP-FtCollins,ex1)" <> on 2001/02/01

Please respond to

To:   "''" <>
Subject:  RE: [AXIS ARCH] - Message Internals

I agree that a pull parser is easier to use than either DOM or SAX, because
it leaves control in the hands of the parser invoker rather than handing it
over to the parser. I also believe it is the only way to achieve the
streaming message approach mostly due to the handing over control. SAX has
chance at streaming only if you are willing to call handlers from the event
callbacks - this sounds really difficult to control.

It is true that multiref arguments will be difficult to handle, but these
are likely to occur primarily from the Body and a Body processor will need
to read the remainder of the message anyway. Header checking and
mustUnderstand validating can be done at the time the headers are parsed -
long before the message Body is processed. Some support for delayed
processing may need to exist in order to fully support this structure - it
does not have to be the mainline for all messages.

A way that the digital signature verifier could be accomplished using the
streaming approach is to handle the header indicating the digital
save away the necessary information to perform the signature verification
later, and insert another handler immediately before the body processing
which will actually perform the signature verification as it streams the
body to the body processor.

In order to achieve optimal performance I think we should strive to:
     read the message bytes no more than once
     parse the message bytes no more than once
     traverse the message no more than once
     keep as little of the message in memory at one time as possible

Bryan Murray

View raw message