From "Stefan Guggisberg" <stefan.guggisb...@gmail.com>
Subject Re: OutOfMemory - adding lots of nodes in one session
Date Fri, 01 Sep 2006 13:13:53 GMT
hi michael

i quickly ran a test which successfully added 20k child nodes to the same
parent (whether that's a useful content model is a different story...).

here's the code i used to test:

    Node parent = root.addNode("foo", "nt:unstructured");
    for (int i = 1; i <= 20000; i++) {
        if (i % 1000 == 0) {
            System.out.println("added 1000 child nodes; total=" + i);

note that save() is a relatively expensive operation; it therefore makes sense
to batch multiple addNode etc calls (which are relatively inexpensive).

please provide a simple self-contained test case that reproduces the behaviour
you're describing.


On 9/1/06, Michael Neale <michael.neale@gmail.com> wrote:
> 1:
> yeah I use JProfiler - top of the charts with a bullet was:
> org.apache.jackrabbit.util.WeakIdentityCollection$WeakRef (a ha ! that would
> explain the performance slug when GC has to kick in late in the piece).
> followed by:
> org.apache.derby.impl.store.raw.data.StoredRecordHeader
> and of course a whole lot of byte[].
> I am using default everything (which means Derby) and no blobs whatsoever
> (so all in the database).
> 2:
> If I logout, and use fresh everything, it seems to continue fine (ie fast
> enough pace), but I haven't really pushed it where I wanted to get it (10000
> Child nodes).
> Responding to Alexandru's email (hi alex, nice work on InfoQ if I remember
> correctly ! I am a fan), it would seem that the Session keeps most in
> memory, which I can understand.
> I guess my problem is that I am trying to load up the system to test really
> basically that it scales to the numbers that I know I need to scale to, but
> I am having trouble getting the data in - bulk load wise. If I bump up the
> memory, it certainly seems to hum along better, but if Session is keeping a
> lot around, then this will have limits - there is no way to "clear" the
> session ?
> Perhaps I will explain what I am using JCR for (feel free to smack me down
> if this is not what JCR and Jackrabbit are ever indended for):
> I am storing "atomic business rules" (which means each node is a small
> single business rule). The data on each node is very small. These nodes are
> stored flat as child nodes under a top level node. To give structure
> (categorisation) for the users, I have references to these nodes all over
> the place so people can navigate them all sorts of different ways (as there
> is no one clear hierarchy at the time the rules are created). JCR gives me
> most of what I need,  but as these rule nodes can number in the thousands
> (4000 is not uncommon for a reasonably complex business unit), then  I am
> worried that  this just can't work.
> I have seen from past posts that people put nodes under different parents
> (so there is no great number of child nodes) so that is one option, but my
> gut feel is that its the WeakIdentityCollection: this well meaning code
> means that the GC has to due a huge amount of work at the worst possible
> time (when under stress). I am sure most of the time this is not an issue.
> Any ideas/tips/gotchas for a newbie? I would really like to be confident
> that I can scale up enough (its modest) with JCR for this purpose.
> On 8/31/06, Nicolas <ntoper@gmail.com> wrote:
> >
> > 2 more ideas:
> >
> > 1/ Did you try using a memory profiler so we can know what is wrong?
> >
> > 2/ What happens if you logout after say 100 updates?
> >
> >
> > a+
> > Nico
> > my blog! http://www.deviant-abstraction.net !!
> >
> >

