Return-Path: Delivered-To: apmail-ws-axis-dev-archive@www.apache.org Received: (qmail 4144 invoked from network); 2 Nov 2004 09:16:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 2 Nov 2004 09:16:02 -0000 Received: (qmail 97640 invoked by uid 500); 2 Nov 2004 09:15:18 -0000 Delivered-To: apmail-ws-axis-dev-archive@ws.apache.org Received: (qmail 97542 invoked by uid 500); 2 Nov 2004 09:15:16 -0000 Mailing-List: contact axis-dev-help@ws.apache.org; run by ezmlm Precedence: bulk Reply-To: axis-dev@ws.apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list axis-dev@ws.apache.org Received: (qmail 97510 invoked by uid 99); 2 Nov 2004 09:15:16 -0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS X-Spam-Check-By: apache.org Received-SPF: neutral (hermes.apache.org: local policy) Received: from [206.46.170.106] (HELO out006.verizon.net) (206.46.170.106) by apache.org (qpsmtpd/0.28) with ESMTP; Tue, 02 Nov 2004 01:15:14 -0800 Received: from [192.168.0.2] ([4.5.64.26]) by out006.verizon.net (InterMail vM.5.01.06.06 201-253-122-130-106-20030910) with ESMTP id <20041102091325.WMFB4017.out006.verizon.net@[192.168.0.2]> for ; Tue, 2 Nov 2004 03:13:25 -0600 Message-ID: <41874FB1.9080908@sosnoski.com> Date: Tue, 02 Nov 2004 01:13:21 -0800 From: Dennis Sosnoski User-Agent: Mozilla Thunderbird 0.8 (X11/20040913) X-Accept-Language: en-us, en MIME-Version: 1.0 To: axis dev Subject: [Axis2] OM Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Authentication-Info: Submitted using SMTP AUTH at out006.verizon.net from [4.5.64.26] at Tue, 2 Nov 2004 03:13:25 -0600 X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N I spent the weekend catching up with the last couple of months of Axis emails and saw some of the activity around the OM. I have a few thoughts I wanted to offer on this. First off, if you really want to keep performance high then I urge you not to build a model. I'd instead suggest something like a parse event store that you can replay on demand using StAX, SAX, or custom APIs. Models are expensive in terms of both time and memory. There's been talk of integrating in XMLBeans, and I know XMLBeans already has some form of backing event store for everything it does. I haven't looked into the performance of XMLBeans, but something like that backing store would probably be a great basis for what you need (and even has XPath and such already implemented on top of it). I've also implemented a simple parse event store for my XBIS project (http://www.xbis.org - the parse event store is currently designed around SAX, and can be found in the eventstore package http://xbis.sourceforge.net/api/index.html). This gave excellent performance (I think replaying the event stream at least 10X parser speed) at a resonable memory cost (about 2X the actual size of the document text for the cases I looked at) without much work on optimization. Working with even an efficient document model is likely going to be both considerably slower and considerably heavier in memory usage. The real limitation I saw for a parse event store was just that the parser APIs are inefficient for working with the data - attributes have to be kept as memory-consuming Strings rather than just character ranges, and in the case of SAX have to be organized into structures for reporting; namespaces are passed in the form of URIs and prefixes rather than objects (forcing applications to go through the same work the parser has done to associate the two); etc. If you actually designed a parse event stream interface rather than working with either SAX or StAX you could probably push the efficiency even higher (in other words, use the event store as an adapter between the parser and your own internal event stream API). - Dennis