Mailing-List: contact axis-dev-help@xml.apache.org; run by ezmlm
Message-ID: <E97137CF9A25D311902E00A0C9F484E006A7F5E5@xfc03.fc.hp.com>
From: "MURRAY,BRYAN (HP-FtCollins,ex1)" <bryan_murray@hp.com>
To: "'axis-dev@xml.apache.org'" <axis-dev@xml.apache.org>
Subject: RE: [AXIS ARCH] - Message Internals
Date: Wed, 31 Jan 2001 10:10:11 -0800
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"

I agree that a pull parser is easier to use than either DOM or SAX, because
it leaves control in the hands of the parser invoker rather than handing it
over to the parser. I also believe it is the only way to achieve the
streaming message approach mostly due to the handing over control. SAX has a
chance at streaming only if you are willing to call handlers from the event
callbacks - this sounds really difficult to control.

It is true that multiref arguments will be difficult to handle, but these
are likely to occur primarily from the Body and a Body processor will need
to read the remainder of the message anyway. Header checking and
mustUnderstand validating can be done at the time the headers are parsed -
long before the message Body is processed. Some support for delayed
processing may need to exist in order to fully support this structure - it
does not have to be the mainline for all messages.

A way that the digital signature verifier could be accomplished using the
streaming approach is to handle the header indicating the digital signature,
save away the necessary information to perform the signature verification
later, and insert another handler immediately before the body processing
which will actually perform the signature verification as it streams the
body to the body processor.

In order to achieve optimal performance I think we should strive to:
	read the message bytes no more than once
	parse the message bytes no more than once
	traverse the message no more than once
	keep as little of the message in memory at one time as possible

Bryan Murray


-----Original Message-----
From: James Snell [mailto:jmsnell@intesolv.com]
Sent: Tuesday, January 30, 2001 12:31 PM
To: 'axis-dev@xml.apache.org'
Subject: RE: [AXIS ARCH] - Message Internals


Sam,

I do think the pull style parser model is best, but I do not think that the
streaming message approach will work for SOAP messages for several key
reasons:

1. The SOAP specification requires that a determination be made whether or
not a message can be processed before it is actually processed.  This
determination includes checking all of the headers for mustUnderstand and
actor attributes.  

2. SOAP's use of accessor multireferencing (id/href) allows for
forward/backwards/external references that may not be possible in the stream
considering the fact that the target of a reference may not have been
received into the stream yet.

An obvious example of this would be an XML signature verifier where the
signature is in the header and the data signed is in the body.  If we use
the streaming approach, then there is the potential that the signed data
won't be available by the time the digital signature verifier is invoked.

The only way that I can see to properly support these two items are to defer
processing until the entire message is received.

- James

> -----Original Message-----
> From: Sam Ruby [mailto:rubys@us.ibm.com]
> Sent: Tuesday, January 30, 2001 6:22 AM
> To: axis-dev@xml.apache.org
> Subject: RE: [AXIS ARCH] - Message Internals
> 
> 
> Yuhichi Nakamura wrote:
> >
> > I just read through this thread.  However, I am not sure
> > how SAX is useful in the context of SOAP message processing.
> > In order to process SOAP messages, we need to "manipulate"
> > XML documents in such a way that header entries are removed,
> > inserted, and potentially modified.  (Body entries might be
> > manipulated in the same manner, but at least header entries
> > MUST be processed by the Axis engine.)
> 
> It is my intiuition, experience, and reading of the current 
> literature that
> retaining the message in memory is not a scalable solution.  
> I've cited as
> an example the recent cocoon rewrite, and pointed out reference to
> Microsoft documentation that indicates that they have hit 
> upon a similar
> problem and outlined their solution.
> 
> Feel free to disagree with the above.  It is my point of view, perhaps
> there are others out there.
> 
> But if you do see the potential for this being a problem, and 
> you have any
> hope for Axis to be successful and therefore deployed in enterprise
> configurations, an alternative must be found.  If not now, it will
> certainly be done in the *next* rewrite.
> 
> Avoiding discussions of a specific API for a moment, what is 
> needed is a
> streaming model.  Headers need to be made available to 
> handlers as they are
> being received.  A given handler could choose to do various 
> things with
> this information - pass it along unmodified, choose NOT to 
> pass it along
> (effectively deleting it), create a new header based on 
> information in the
> original.  In fact, a handler could easily insert a new 
> header into the
> output stream.
> 
> There are two basic approaches to streaming: a PUSH model, which SAX
> represents.  Or a PULL model, which some of the APIs which have been
> submitted to ECMA for standardization represent.  Between these two
> alternatives, James seems to favor a pull model.  I'm 
> inclined to agree.
> 
> - Sam Ruby
> 
> 
>