jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Neale" <michael.ne...@gmail.com>
Subject Re: OutOfMemory - adding lots of nodes in one session
Date Mon, 04 Sep 2006 09:27:15 GMT
hi Stefan.

Yes I was able to make it rip through saving lots of simple nodes like that
no problem.
When I add more properties, it degrades a fair bit (probably not surprising
if I guess at how the data is being stored for each property).

Interestingly, when I use my own specific node type it slows down quite a
lot (and memory consumption goes up) then with nt:unstructured, yet with all
other properties being set in the same way. I had to bump up the memory
quite a lot to avoid OutOfMemoryException's.

In the end, when I batched things up, I was able to ramp up the number of
nodes to what I wanted to test. Performance was acceptable once it was
loaded up - it is definately the save() operations that are the most
expensive. It was just very very difficult to build up my test data without
killing memory.

Thanks everyone for your help, I have learned a lot about jackrabbit in the
meantime.

On 9/1/06, Stefan Guggisberg <stefan.guggisberg@gmail.com> wrote:
>
> hi michael
>
> i quickly ran a test which successfully added 20k child nodes to the same
> parent (whether that's a useful content model is a different story...).
>
> here's the code i used to test:
>
>     Node parent = root.addNode("foo", "nt:unstructured");
>     for (int i = 1; i <= 20000; i++) {
>         parent.addNode("bar");
>         if (i % 1000 == 0) {
>             root.save();
>             System.out.println("added 1000 child nodes; total=" + i);
>         }
>     }
>
> note that save() is a relatively expensive operation; it therefore makes
> sense
> to batch multiple addNode etc calls (which are relatively inexpensive).
>
> please provide a simple self-contained test case that reproduces the
> behaviour
> you're describing.
>
> cheers
> stefan
>
> On 9/1/06, Michael Neale <michael.neale@gmail.com> wrote:
> > 1:
> > yeah I use JProfiler - top of the charts with a bullet was:
> > org.apache.jackrabbit.util.WeakIdentityCollection$WeakRef (a ha ! that
> would
> > explain the performance slug when GC has to kick in late in the piece).
> > followed by:
> > org.apache.derby.impl.store.raw.data.StoredRecordHeader
> > and of course a whole lot of byte[].
> >
> > I am using default everything (which means Derby) and no blobs
> whatsoever
> > (so all in the database).
> >
> > 2:
> > If I logout, and use fresh everything, it seems to continue fine (ie
> fast
> > enough pace), but I haven't really pushed it where I wanted to get it
> (10000
> > Child nodes).
> >
> > Responding to Alexandru's email (hi alex, nice work on InfoQ if I
> remember
> > correctly ! I am a fan), it would seem that the Session keeps most in
> > memory, which I can understand.
> >
> > I guess my problem is that I am trying to load up the system to test
> really
> > basically that it scales to the numbers that I know I need to scale to,
> but
> > I am having trouble getting the data in - bulk load wise. If I bump up
> the
> > memory, it certainly seems to hum along better, but if Session is
> keeping a
> > lot around, then this will have limits - there is no way to "clear" the
> > session ?
> >
> > Perhaps I will explain what I am using JCR for (feel free to smack me
> down
> > if this is not what JCR and Jackrabbit are ever indended for):
> > I am storing "atomic business rules" (which means each node is a small
> > single business rule). The data on each node is very small. These nodes
> are
> > stored flat as child nodes under a top level node. To give structure
> > (categorisation) for the users, I have references to these nodes all
> over
> > the place so people can navigate them all sorts of different ways (as
> there
> > is no one clear hierarchy at the time the rules are created). JCR gives
> me
> > most of what I need,  but as these rule nodes can number in the
> thousands
> > (4000 is not uncommon for a reasonably complex business unit), then  I
> am
> > worried that  this just can't work.
> >
> > I have seen from past posts that people put nodes under different
> parents
> > (so there is no great number of child nodes) so that is one option, but
> my
> > gut feel is that its the WeakIdentityCollection: this well meaning code
> > means that the GC has to due a huge amount of work at the worst possible
> > time (when under stress). I am sure most of the time this is not an
> issue.
> >
> > Any ideas/tips/gotchas for a newbie? I would really like to be confident
> > that I can scale up enough (its modest) with JCR for this purpose.
> >
> > On 8/31/06, Nicolas <ntoper@gmail.com> wrote:
> > >
> > > 2 more ideas:
> > >
> > > 1/ Did you try using a memory profiler so we can know what is wrong?
> > >
> > > 2/ What happens if you logout after say 100 updates?
> > >
> > >
> > > a+
> > > Nico
> > > my blog! http://www.deviant-abstraction.net !!
> > >
> > >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message