xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Vasilik" <Eric.Vasi...@bea.com>
Subject RE: XMLBeans performance and source code status [Re: Proposal: XMLBeans]
Date Sun, 06 Jul 2003 00:55:49 GMT
When working with XMLBeans in a strongly typed way (with a Schema), individual objects are
created for each piece of information, usually instances of simple and complex Schema types.
 However, you can also access and manipulate the XML in a typeless manor.  What we've done
with XMLBeans is provided access to the full XML Infoset via the XmlCursor interface.

XmlCursor provides functionality very similar to the DOM, but takes a very different tact.
 Instead of creating an DOM Node for each element, attribute, text, etc, one may create a
single XmlCursor and navigate that cursor about the XML instance, interrogating the XML: element/attr
names, child/parent elements, text, comments, etc.  Also, one may modify the XML by removing
elements and attrs, inserting text, for example.  All of this can be done by either not creating
objects or reusing objects so that the number of objects needed to operate on the XML is constant,
not on the order of the size of the XML like a DOM would require.

The kind of interface allows an implementer of an in memory XML store more freedom to implement
the internal structure which represents the XML in memory.  One, for example, could simply
store the XML as it was, for example, read in from disk and implement a cursor as an index
into that string, parsing or modifying the parts of the string as necessary to satisfy the
requests.  We don't go to quite this extreme.  In principle, we create one object for every
leaf element or attribute and two objects for every interior element.  All text for attribute
values, comments, procinst's and text between element markup is stored in a single character
array.

We have found that creating fewer objects and batching text leads to loading the XML into
memory faster as well as having a similar, if not slightly smaller, memory footprint when
compared to the DOM.  Also, working with cursors seems to be an easier programming model than
the DOM as it does not have text nodes and is more intuitive.

With respect to the synchronized access, the strongly typed schema XMLBeans objects cache
values so that conversion to text does not occur until it is needed.  Likewise, when modifications
are made to the XML Infoset, the strongly typed data (ints, for example) are not parsed from
the text until requested.  In general the impact of synchronization is quite low because of
the lazy approach we have taken along with the caching.  As I read your question again, I
realize that you may have interpreted synchronized to mean "managing data among several threads".
 The synchronization described refers to the fact that one may manipulate the XML via the
XmlCursor or the strongly typed XMLBean classes generated from the schema, each mechanism
capable of seeing the changes from the other in a tightly integrated way.

With respect to building XMLBeans, we plan to remove any dependency upon the jars you mentioned.
 Indeed, there exists very little dependence on these.  Mostly just interfaces, not any classes
needed for the implementation.

- Eric Vasilik

-----Original Message-----
From: Aleksander Slominski [mailto:aslom@cs.indiana.edu]
Sent: Friday, July 04, 2003 8:31 PM
To: general@xml.apache.org
Cc: Jakarta General List; general@incubator.apache.org
Subject: XMLBeans performance and source code status [Re: Proposal:
XMLBeans]


Cliff Schmidt wrote:

>>What's compelling about XMLBeans compared to some of the other front
>>runners, such as JDOM and XOM, Castor and JAXB?
>>    
>>
>
>The main difference between XMLBeans and JDOM or XOM is that XMLBeans
>does not create objects for each XML information item.  Instead, it 
>provides cursor-based access to each item in the XML Infoset.  It has
>an architecture where, if an actual object is needed for a node, it 
>can be created on-demand.  We found this provided great performance 
>benefit.  
>
hi,

i am interested to find if you have some more details on performance 
benefits - it seems to be very intriguing and distinguishing feature of 
XMLBeans.

i may be missing something but i tried to find this information online 
without any lack (i checked 
http://dev2dev.bea.com/articles/hitesh_seth.jsp that is good overview 
but has not enough technical details and other docs): as far as i can 
understand actual objects are created for every XML information item? so 
as objects are in memory the same way as objects in DOM what performance 
benefits do you have in mind? do you refer to faster creation time or 
lower memory footprint? did you check for example on the same machine 
how big XML document can be loaded with XMLBeans and DOM (for example 
Xerces2) before running out of memory?

>The biggest differences between XMLBeans and Castor or JAXB
>are:
>1) the goal of 100% Schema support (currently supports everything in 
>Schema other than redefine and substitution groups, and those features
>are nearly ready), and 
>2) the integrated and synchronized access of the underlying XML content
>with strongly typed Java classes.
>
did you estimate what is impact of requiring synchronized access? i am 
really curious why was is it required:. i can see need to share XML 
schemas but why to require synchronizing access to XML content? i would 
think that approach from java.util where collections are not thread-safe 
until specifically made synchronized could work here as well?

>>I'd say you'd want to do as much setup before incubation as possible.
>>This includes normalizing your code layout (something that didn't
>>materialize for Tapestry, unfortunately) to match the other Jakarta
>>projects (this will ease things if and when you transition to Maven
>>builds).  You probably want to check out a bit about Gump as well ...
>>I can think of one person who will probably veto you until you are
>>integrated into Gump.  It's *exceptionally* painful to work with Gump
>>at the moment, but ultimately worth it.  
>>    
>>
i have question concerning Gump bit in general what is on Wiki page 
http://nagoya.apache.org/wiki/apachewiki.cgi?XmlBeansProposal:

(...) '''(2) identify the initial source from which the subproject is to be populated'''

*http://workshop.bea.com/xmlbeans/XsdUpload.jsp

(...)

i looked on source code and it seems that it is not possible to rebuild 
xbean.jar just from source and it is not clear what are dependencies?

i noticed there are parts of code that depends on outside packages (like 
weblogic.xml.stream.XMLInputStream or com.bea.xquery) and some 
subpackages that are in com.bea.xml* that are in xbean.jar but not in 
src directory?

what are plans for those pieces of code - are they also open source or 
XMLBeans would depend on BEA implementation classes to be on CLASSPATH 
to compile it?

i hope XmlBeans will be actively developed as open source (in Apache or 
outside) so it continues to grow as it really looks like an interesting 
project.

thanks,

alek

-- 
If everything seems under control, you're just not going fast enough. —Mario Andretti



---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@jakarta.apache.org
For additional commands, e-mail: general-help@jakarta.apache.org


---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Mime
View raw message