abdera-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James M Snell <jasn...@gmail.com>
Subject Re: Understanding Incremental Parsing [was Re: failing parser test]
Date Tue, 09 Oct 2007 20:02:26 GMT
The incremental parser model assures that only the objects we actually
need will be loaded into memory.  A better way to put it would be
parse-on-demand.  Think of it as a hybrid between the SAX and DOM
approaches.  The main advantage of this approach is that is uses
significantly less memory than DOM.  Another advantage is that is means
we can introduce filters into the parsing process so that unwanted
elements are ignored completely (that's the ParseFilter stuff you see in
the core).  To illustrate the difference, a while back we used ROME
(which uses JDOM) to parse Tim Bray's Atom feed and output just titles
and links to System.out.  We used Abdera with a parse filter to do the
exact same test.  The JDOM approach used over 6MB of memory; the Abdera
approach used right around ~700 kb of memory.  The Abdera approach was
significantly faster as well.

- James

Dan Diephouse wrote:
> Was wondering if someone could answer a quick question on the
> incremental parsing business just so I can be sure I fully get things.
> As I understand most parts of the abdera model (at least the impl) are
> built on an Axiom OMElementImpl. As far as incremental parsing is
> concerned, the thing that this is buying Abdera is that Axiom can
> discard nodes later on right? i.e. I can read entry 1 than move to entry
> 2 and entry 1 will leave memory? If so, how is that turned on?
> - Dan
> James M Snell wrote:
>> Forcing a clone is the wrong thing to do, but we could introduce a
>> method that would force the parse to complete without creating a bunch
>> of duplicate objects. FWIW, that could be done today by calling
>> toString() rather than clone.
>> - James
>> Ugo Cei wrote:
>>> On Oct 8, 2007, at 9:10 PM, Dan Diephouse wrote:
>>>> I think this test should be disabled for now. I don't think its good
>>>> policy to just leave a failing test in the build. The build should
>>>> *always* build and *always* run the tests IMO.  The issue can just be
>>>> marked as a blocker for the release and revisited when time/priorities
>>>> permit. As a user and developer its very frustrating to find a build
>>>> that doesn't work (like the maven build in abdera currently).
>>> I am always fighting with myself over issues like this one, but in this
>>> case I think you are right, so I've put the workaround in place to make
>>> the test succeed.
>>> I also agree with Garrett that this should be considered a bug: it's
>>> just too easy for users to fall into it and bang their head against a
>>> wall for a few hours before they realize this is the way the code is
>>> actually supposed to work and implementing the workaround in their own
>>> code.
>>> OTOH, I don't know how easy this would be to fix: maybe by keeping track
>>> of partially-parsed documents and calling clone() internally when a
>>> modification attempt is detected? Sounds messy.
>>>     Ugo

View raw message