Mailing-List: contact axis-dev-help@xml.apache.org; run by ezmlm
Importance: Normal
Subject: RE: [AXIS ARCH] - Message Internals
To: axis-dev@xml.apache.org
Message-ID: <OF9D181CCC.FD3D2FD7-ON492569E6.000502B0@LocalDomain>
From: "Yuhichi Nakamura" <NAKAMURY@jp.ibm.com>
Date: Thu, 1 Feb 2001 11:02:26 +0900
MIME-Version: 1.0
Content-type: text/plain; charset=us-ascii


Hi Bryan,
I just want to make sure what the pull parser means.  Do you indicate a
particular
functional parser or just a concept?  I thought that DOM is a pull parser,
and SAX
is a push parser in this context.  Maybe I am wrong.  Please correct me.

For Digital Signature, there exists a DOM-based tool (very stable) on IBM
alphaWorks (actually, it comes from our team).  Do you really want to
develop
yet another dig-sig tool in this project?  I think that we need to adopt
existing
"stable" modules as much as possible.

Your items for perfomance are very adequate.  I would ask: Do we have such
parser,
or do we develop such parser in this project.

IMHO, we should not assume things that does not exist.  Axis engine should
be
developed on top of existing techonoloies, therefore we should not reinvent
similar things in this project.  At this moment, I feel that Xerces
is the most appropriate for the parser stuff.

Regards,

Yuhichi Nakamura
IBM Research, Tokyo Research Laboratory
Tel: +81-46-215-4668
FAX: +81-46-215-7413


From: "MURRAY,BRYAN (HP-FtCollins,ex1)" <bryan_murray@hp.com> on 2001/02/01
      03:10

Please respond to axis-dev@xml.apache.org

To:   "'axis-dev@xml.apache.org'" <axis-dev@xml.apache.org>
cc:
Subject:  RE: [AXIS ARCH] - Message Internals


I agree that a pull parser is easier to use than either DOM or SAX, because
it leaves control in the hands of the parser invoker rather than handing it
over to the parser. I also believe it is the only way to achieve the
streaming message approach mostly due to the handing over control. SAX has
a
chance at streaming only if you are willing to call handlers from the event
callbacks - this sounds really difficult to control.

It is true that multiref arguments will be difficult to handle, but these
are likely to occur primarily from the Body and a Body processor will need
to read the remainder of the message anyway. Header checking and
mustUnderstand validating can be done at the time the headers are parsed -
long before the message Body is processed. Some support for delayed
processing may need to exist in order to fully support this structure - it
does not have to be the mainline for all messages.

A way that the digital signature verifier could be accomplished using the
streaming approach is to handle the header indicating the digital
signature,
save away the necessary information to perform the signature verification
later, and insert another handler immediately before the body processing
which will actually perform the signature verification as it streams the
body to the body processor.

In order to achieve optimal performance I think we should strive to:
     read the message bytes no more than once
     parse the message bytes no more than once
     traverse the message no more than once
     keep as little of the message in memory at one time as possible

Bryan Murray


-----Original Message-----
From: James Snell [mailto:jmsnell@intesolv.com]
Sent: Tuesday, January 30, 2001 12:31 PM
To: 'axis-dev@xml.apache.org'
Subject: RE: [AXIS ARCH] - Message Internals


Sam,

I do think the pull style parser model is best, but I do not think that the
streaming message approach will work for SOAP messages for several key
reasons:

1. The SOAP specification requires that a determination be made whether or
not a message can be processed before it is actually processed.  This
determination includes checking all of the headers for mustUnderstand and
actor attributes.

2. SOAP's use of accessor multireferencing (id/href) allows for
forward/backwards/external references that may not be possible in the
stream
considering the fact that the target of a reference may not have been
received into the stream yet.

An obvious example of this would be an XML signature verifier where the
signature is in the header and the data signed is in the body.  If we use
the streaming approach, then there is the potential that the signed data
won't be available by the time the digital signature verifier is invoked.

The only way that I can see to properly support these two items are to
defer
processing until the entire message is received.

- James

> -----Original Message-----
> From: Sam Ruby [mailto:rubys@us.ibm.com]
> Sent: Tuesday, January 30, 2001 6:22 AM
> To: axis-dev@xml.apache.org
> Subject: RE: [AXIS ARCH] - Message Internals
>
>
> Yuhichi Nakamura wrote:
> >
> > I just read through this thread.  However, I am not sure
> > how SAX is useful in the context of SOAP message processing.
> > In order to process SOAP messages, we need to "manipulate"
> > XML documents in such a way that header entries are removed,
> > inserted, and potentially modified.  (Body entries might be
> > manipulated in the same manner, but at least header entries
> > MUST be processed by the Axis engine.)
>
> It is my intiuition, experience, and reading of the current
> literature that
> retaining the message in memory is not a scalable solution.
> I've cited as
> an example the recent cocoon rewrite, and pointed out reference to
> Microsoft documentation that indicates that they have hit
> upon a similar
> problem and outlined their solution.
>
> Feel free to disagree with the above.  It is my point of view, perhaps
> there are others out there.
>
> But if you do see the potential for this being a problem, and
> you have any
> hope for Axis to be successful and therefore deployed in enterprise
> configurations, an alternative must be found.  If not now, it will
> certainly be done in the *next* rewrite.
>
> Avoiding discussions of a specific API for a moment, what is
> needed is a
> streaming model.  Headers need to be made available to
> handlers as they are
> being received.  A given handler could choose to do various
> things with
> this information - pass it along unmodified, choose NOT to
> pass it along
> (effectively deleting it), create a new header based on
> information in the
> original.  In fact, a handler could easily insert a new
> header into the
> output stream.
>
> There are two basic approaches to streaming: a PUSH model, which SAX
> represents.  Or a PULL model, which some of the APIs which have been
> submitted to ECMA for standardization represent.  Between these two
> alternatives, James seems to favor a pull model.  I'm
> inclined to agree.
>
> - Sam Ruby
>
>
>