Mailing-List: contact axis-dev-help@ws.apache.org; run by ezmlm
Precedence: bulk
Reply-To: axis-dev@ws.apache.org
Received-SPF: neutral (hermes.apache.org: local policy)
Message-ID: <41874FB1.9080908@sosnoski.com>
Date: Tue, 02 Nov 2004 01:13:21 -0800
From: Dennis Sosnoski <dms@sosnoski.com>
User-Agent: Mozilla Thunderbird 0.8 (X11/20040913)
MIME-Version: 1.0
To: axis dev <axis-dev@ws.apache.org>
Subject: [Axis2] OM
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

I spent the weekend catching up with the last couple of months of Axis 
emails and saw some of the activity around the OM. I have a few thoughts 
I wanted to offer on this.

First off, if you really want to keep performance high then I urge you 
not to build a model. I'd instead suggest something like a parse event 
store that you can replay on demand using StAX, SAX, or custom APIs. 
Models are expensive in terms of both time and memory. There's been talk 
of integrating in XMLBeans, and I know XMLBeans already has some form of 
backing event store for everything it does. I haven't looked into the 
performance of XMLBeans, but something like that backing store would 
probably be a great basis for what you need (and even has XPath and such 
already implemented on top of it).

I've also implemented a simple parse event store for my XBIS project 
(http://www.xbis.org - the parse event store is currently designed 
around SAX, and can be found in the eventstore package 
http://xbis.sourceforge.net/api/index.html). This gave excellent 
performance (I think replaying the event stream at least 10X parser 
speed) at a resonable memory cost (about 2X the actual size of the 
document text for the cases I looked at) without much work on 
optimization. Working with even an efficient document model is likely 
going to be both considerably slower and considerably heavier in memory 
usage.

The real limitation I saw for a parse event store was just that the 
parser APIs are inefficient for working with the data - attributes have 
to be kept as memory-consuming Strings rather than just character 
ranges, and in the case of SAX have to be organized into structures for 
reporting; namespaces are passed in the form of URIs and prefixes rather 
than objects (forcing applications to go through the same work the 
parser has done to associate the two); etc. If you actually designed a 
parse event stream interface rather than working with either SAX or StAX 
you could probably push the efficiency even higher (in other words, use 
the event store as an adapter between the parser and your own internal 
event stream API).

  - Dennis