From xmlbeans-dev-return-302-apmail-xml-xmlbeans-dev-archive=xml.apache.org@xml.apache.org Tue Sep 30 14:45:03 2003 Return-Path: Delivered-To: apmail-xml-xmlbeans-dev-archive@www.apache.org Received: (qmail 99532 invoked from network); 30 Sep 2003 14:45:03 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 30 Sep 2003 14:45:03 -0000 Received: (qmail 52097 invoked by uid 500); 30 Sep 2003 14:44:58 -0000 Delivered-To: apmail-xml-xmlbeans-dev-archive@xml.apache.org Received: (qmail 52082 invoked by uid 500); 30 Sep 2003 14:44:57 -0000 Mailing-List: contact xmlbeans-dev-help@xml.apache.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Post: List-Help: List-Unsubscribe: List-Subscribe: Reply-To: xmlbeans-dev@xml.apache.org Delivered-To: mailing list xmlbeans-dev@xml.apache.org Received: (qmail 52069 invoked from network); 30 Sep 2003 14:44:57 -0000 Message-Id: X-Mailer: Novell GroupWise Internet Agent 5.5.6.1 Date: Tue, 30 Sep 2003 10:44:40 -0400 From: "Darrell Teague" To: Subject: Re: V2 store Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="=_431D08FB.2445CA02" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N --=_431D08FB.2445CA02 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable My two cents... I recommend that the group focus back on the use-cases (if there are any) = for which the software is designed to support. For example, in an = e-commerce context, XML documents are generally be between 2K and 200K for = 95% of the transactions (Purchase Orders, Invoices, etc). Therefore, I = recommend that the goals of small, fast and efficient hold, while = supporting DOM (since node transversal is a common need) under a model of = less than a MB of data (or some threshold). As for memory in general, = there should be an XMLBeans road map that considers the industry trend = toward more memory at a lower cost (remember a few months ago when you = thought one GB was a lot of memory?). Factoring that in may make a big = difference in establishing the target threshold. >>> david.bau@bea.com 09/26/03 04:42PM >>> Eric, You called this morning with a difficult design problem that you're facing with v2 store given the features listed on the feature page, and I'm summarizing here. Perhaps somebody reading here will have some ideas. Some of the problems that need to be solved are: (1) Support DOM in addition to our cursor API (2) Work with very large payloads without running out of RAM (3) Keep us small, keep us fast. That means try to reduce object allocation, and try to avoid slower things like synchronize{} blocks. (4) When dealing with read-only data, a naive multithreaded user should be able to assume that they do not need to synchronize reads. (This is not on the feature list, but seems like an important API property.) But when you put together (1) (2) (3) and (4), you get some fundamental tensions: Here's the tension: (a) The DOM API (1) implies many more objects than you actually need. For example, who really cares about the whitespace between tags in a typical app? And if you can bind directly to "int", who really wants to ever allocate the string object that contains "413231"? So that's in conflict with goal (3), being small, unless we build a "lazy DOM" that creates objects on demand. (b) Dealing with very large instances (2) also seems to leads to "lazy object" created on demand. For example, if the bulk of an 20GB instance = is stored on disk, yet an app can hold on to an object that represents a = node, then certainly not all nodes can be in memory at once. They're created on demand. (c) But creating objects on demand means that read operations mutate the underlying data structure. This is in conflict with goal (4), that is, multiple readers on multiple threads need to syncrhonize against each = other, unless we synchronize for them. But if we synchornize for them, that's again in conflict with goal (3). (d) The upshot: it seems like - we need to synchronize at a low level to satisfy (4) at the same time as allocate-on-demand - to satisfy (3) - i.e., no synchronization cost, perhaps we should have a global option per instance to turn off synchronization; users can use this option if they are synchronizing themselves in a savvy mulithreaded app, = or if they are truly single-threaded. That last bullet is a bit clumsy. But I don't see anything better.... Thoughts? David - --------------------------------------------------------------------- To unsubscribe, e-mail: xmlbeans-dev-unsubscribe@xml.apache.org For additional commands, e-mail: xmlbeans-dev-help@xml.apache.org Apache XMLBeans Project -- URL: http://xml.apache.org/xmlbeans/ --=_431D08FB.2445CA02 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
My two cents...
I recommend that the group = focus back on=20 the use-cases (if there are any) for which the software is designed to=20 support.  For example, in an e-commerce context, XML documents ar= e=20 generally be between 2K and 200K for 95% of the transactions (Purchase = Orders,=20 Invoices, etc).  Therefore, I recommend that the goals of small, fast = and=20 efficient hold, while supporting DOM (since node transversal is a common = need)=20 under a model of less than a MB of data (or some threshold).  As = for=20 memory in general, there should be an XMLBeans road map that = considers the=20 industry trend toward more memory at a lower cost (remember = a few=20 months ago when you thought one GB was a lot of memory?). = =20 Factoring that in may make a big difference in establishing the target=20 threshold.

>>> david.bau@bea.com 09/26/03 04:42PM=20= >>>
Eric,

You called this morning with a difficult = design=20 problem that you're facing
with v2 store given the features listed on = the=20 feature page, and I'm
summarizing here.  Perhaps somebody reading = here=20 will have some ideas.

Some of the problems that need to be = solved=20 are:
(1) Support DOM in addition to our cursor API
(2) Work with = very=20 large payloads without running out of RAM
(3) Keep us small, keep us=20 fast.  That means try to reduce object
allocation, and try to = avoid=20 slower things like synchronize{} blocks.
(4) When dealing with = read-only=20 data, a naive multithreaded user should be
able to assume that they do = not=20 need to synchronize reads. (This is not on
the feature list, but seems = like=20 an important API property.)

But when you put together (1) (2) (3) = and=20 (4), you get some fundamental
tensions:

Here's the tension:
(a)=20 The DOM API (1) implies many more objects than you actually need. =20 For
example, who really cares about the whitespace between tags in a=20 typical
app?  And if you can bind directly to "int", who really = wants to=20 ever
allocate the string object that contains "413231"?  So that's = in=20 conflict
with goal (3), being small, unless we build a "lazy DOM" = that=20 creates
objects on demand.

(b) Dealing with very large instances = (2)=20 also seems to leads to "lazy
object" created on demand.  For = example, if=20 the bulk of an 20GB instance is
stored on disk, yet an app can hold on = to an=20 object that represents a node,
then certainly not all nodes can be in = memory=20 at once.  They're created on
demand.

(c) But creating = objects on=20 demand means that read operations mutate the
underlying data structure.&= nbsp;=20 This is in conflict with goal (4), that is,
multiple readers on = multiple=20 threads need to syncrhonize against each other,
unless we synchronize = for=20 them.  But if we synchornize for them, that's
again in conflict = with=20 goal (3).

(d) The upshot: it seems like
- we need to synchronize = at a=20 low level to satisfy (4) at the same time as
allocate-on-demand
- = to=20 satisfy (3) - i.e., no synchronization cost, perhaps we should have = a
global=20 option per instance to turn off synchronization; users can use this
opti= on if=20 they are synchronizing themselves in a savvy mulithreaded app, or
if = they are=20 truly single-threaded.

That last bullet is a bit clumsy.  But = I=20 don't see anything better....

Thoughts?

David


-=20= ---------------------------------------------------------------------
To= =20 unsubscribe, e-mail:   xmlbeans-dev-unsubscribe@xml.apache.orgFor=20 additional commands, e-mail: xmlbeans-dev-help@xml.apache.org
Apache = XMLBeans=20 Project -- URL: http://xml.apache.org/xmlbeans/

--=_431D08FB.2445CA02--