My two cents...
I recommend that the group focus back on the use-cases (if there are any) for which the software is designed to support.  For example, in an e-commerce context, XML documents are generally be between 2K and 200K for 95% of the transactions (Purchase Orders, Invoices, etc).  Therefore, I recommend that the goals of small, fast and efficient hold, while supporting DOM (since node transversal is a common need) under a model of less than a MB of data (or some threshold).  As for memory in general, there should be an XMLBeans road map that considers the industry trend toward more memory at a lower cost (remember a few months ago when you thought one GB was a lot of memory?).  Factoring that in may make a big difference in establishing the target threshold.

>>> 09/26/03 04:42PM >>>

You called this morning with a difficult design problem that you're facing
with v2 store given the features listed on the feature page, and I'm
summarizing here.  Perhaps somebody reading here will have some ideas.

Some of the problems that need to be solved are:
(1) Support DOM in addition to our cursor API
(2) Work with very large payloads without running out of RAM
(3) Keep us small, keep us fast.  That means try to reduce object
allocation, and try to avoid slower things like synchronize{} blocks.
(4) When dealing with read-only data, a naive multithreaded user should be
able to assume that they do not need to synchronize reads. (This is not on
the feature list, but seems like an important API property.)

But when you put together (1) (2) (3) and (4), you get some fundamental

Here's the tension:

(a) The DOM API (1) implies many more objects than you actually need.  For
example, who really cares about the whitespace between tags in a typical
app?  And if you can bind directly to "int", who really wants to ever
allocate the string object that contains "413231"?  So that's in conflict
with goal (3), being small, unless we build a "lazy DOM" that creates
objects on demand.

(b) Dealing with very large instances (2) also seems to leads to "lazy
object" created on demand.  For example, if the bulk of an 20GB instance is
stored on disk, yet an app can hold on to an object that represents a node,
then certainly not all nodes can be in memory at once.  They're created on

(c) But creating objects on demand means that read operations mutate the
underlying data structure.  This is in conflict with goal (4), that is,
multiple readers on multiple threads need to syncrhonize against each other,
unless we synchronize for them.  But if we synchornize for them, that's
again in conflict with goal (3).

(d) The upshot: it seems like
- we need to synchronize at a low level to satisfy (4) at the same time as
- to satisfy (3) - i.e., no synchronization cost, perhaps we should have a
global option per instance to turn off synchronization; users can use this
option if they are synchronizing themselves in a savvy mulithreaded app, or
if they are truly single-threaded.

That last bullet is a bit clumsy.  But I don't see anything better....



- ---------------------------------------------------------------------
To unsubscribe, e-mail:
For additional commands, e-mail:
Apache XMLBeans Project -- URL: