jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Neale" <michael.ne...@gmail.com>
Subject Re: OutOfMemory - adding lots of nodes in one session
Date Wed, 06 Sep 2006 12:14:08 GMT
ok, that makes sense, yes I do definately want versionability, and yes, am
happy to pay that performance (it is small in reality). The only issue I had
was memory use when trying to load up the system to do some measurements, I
got there in the end, thanks for everyones help !

Michael.

On 9/6/06, Stefan Guggisberg <stefan.guggisberg@gmail.com> wrote:
>
> On 9/4/06, Stefan Guggisberg <stefan.guggisberg@gmail.com> wrote:
> > On 9/4/06, Michael Neale <michael.neale@gmail.com> wrote:
> > > Hi Stefan.
> > >
> > > Node types attached, and the example code that rips through it and
> saves
> > > stuff. Let me know if there is anything obvious I am doing wrong !
> > >
> > >
> > > Any one interested can download the loop code and node types from this
> zip:
> > > http://www.users.on.net/~michaelneale/work/jackrabbit_perf.zip
> >
> > thanks, michael. i'll have a look at it, hopefully sometime this week,
> > and i'll get back to you with my findings.
>
> i solved the mistery ;) your node type extends from mix:versionable
> which explains why using nt:unstructured provides better performance.
>
> if you add mix:versionable to your nt:unstructured nodes you'll get about
> the same performance figures as when using your own node types.
>
> versionability of a node inevitably incurs additional overhead (such
> as allocating
> resources in the version store). unless you really need versionability
> i'd suggest
> to avoid mix:versionable in your node type model.
>
> cheers
> stefan
>
> >
> > cheers
> > stefan
> >
> > >
> > > On 9/4/06, Stefan Guggisberg <stefan.guggisberg@gmail.com> wrote:
> > > >
> > > > hi michael,
> > > >
> > > > On 9/4/06, Michael Neale <michael.neale@gmail.com> wrote:
> > > > > hi Stefan.
> > > > >
> > > > > Yes I was able to make it rip through saving lots of simple nodes
> like
> > > > that
> > > > > no problem.
> > > > > When I add more properties, it degrades a fair bit (probably not
> > > > surprising
> > > > > if I guess at how the data is being stored for each property).
> > > > >
> > > > > Interestingly, when I use my own specific node type it slows down
> quite
> > > > a
> > > > > lot (and memory consumption goes up) then with nt:unstructured,
> yet with
> > > > all
> > > > > other properties being set in the same way. I had to bump up the
> memory
> > > > > quite a lot to avoid OutOfMemoryException's.
> > > >
> > > > that's indeed very interesting and comes as a surprise. would you
> mind
> > > > sharing
> > > > with us your node type definitions and some sample code? i'd like to
> > > > investigate
> > > > this further.
> > > >
> > > > cheers
> > > > stefan
> > > >
> > > > >
> > > > > In the end, when I batched things up, I was able to ramp up the
> number
> > > > of
> > > > > nodes to what I wanted to test. Performance was acceptable once it
> was
> > > > > loaded up - it is definately the save() operations that are the
> most
> > > > > expensive. It was just very very difficult to build up my test
> data
> > > > without
> > > > > killing memory.
> > > > >
> > > > > Thanks everyone for your help, I have learned a lot about
> jackrabbit in
> > > > the
> > > > > meantime.
> > > > >
> > > > > On 9/1/06, Stefan Guggisberg <stefan.guggisberg@gmail.com>
wrote:
> > > > > >
> > > > > > hi michael
> > > > > >
> > > > > > i quickly ran a test which successfully added 20k child nodes
to
> the
> > > > same
> > > > > > parent (whether that's a useful content model is a different
> > > > story...).
> > > > > >
> > > > > > here's the code i used to test:
> > > > > >
> > > > > >     Node parent = root.addNode("foo", "nt:unstructured");
> > > > > >     for (int i = 1; i <= 20000; i++) {
> > > > > >         parent.addNode("bar");
> > > > > >         if (i % 1000 == 0) {
> > > > > >             root.save();
> > > > > >             System.out.println("added 1000 child nodes; total="
> + i);
> > > > > >         }
> > > > > >     }
> > > > > >
> > > > > > note that save() is a relatively expensive operation; it
> therefore
> > > > makes
> > > > > > sense
> > > > > > to batch multiple addNode etc calls (which are relatively
> > > > inexpensive).
> > > > > >
> > > > > > please provide a simple self-contained test case that reproduces
> the
> > > > > > behaviour
> > > > > > you're describing.
> > > > > >
> > > > > > cheers
> > > > > > stefan
> > > > > >
> > > > > > On 9/1/06, Michael Neale <michael.neale@gmail.com> wrote:
> > > > > > > 1:
> > > > > > > yeah I use JProfiler - top of the charts with a bullet
was:
> > > > > > > org.apache.jackrabbit.util.WeakIdentityCollection$WeakRef
(a
> ha !
> > > > that
> > > > > > would
> > > > > > > explain the performance slug when GC has to kick in late
in
> the
> > > > piece).
> > > > > > > followed by:
> > > > > > > org.apache.derby.impl.store.raw.data.StoredRecordHeader
> > > > > > > and of course a whole lot of byte[].
> > > > > > >
> > > > > > > I am using default everything (which means Derby) and no
blobs
> > > > > > whatsoever
> > > > > > > (so all in the database).
> > > > > > >
> > > > > > > 2:
> > > > > > > If I logout, and use fresh everything, it seems to continue
> fine (ie
> > > > > > fast
> > > > > > > enough pace), but I haven't really pushed it where I wanted
to
> get
> > > > it
> > > > > > (10000
> > > > > > > Child nodes).
> > > > > > >
> > > > > > > Responding to Alexandru's email (hi alex, nice work on
InfoQ
> if I
> > > > > > remember
> > > > > > > correctly ! I am a fan), it would seem that the Session
keeps
> most
> > > > in
> > > > > > > memory, which I can understand.
> > > > > > >
> > > > > > > I guess my problem is that I am trying to load up the system
> to test
> > > > > > really
> > > > > > > basically that it scales to the numbers that I know I need
to
> scale
> > > > to,
> > > > > > but
> > > > > > > I am having trouble getting the data in - bulk load wise.
If I
> bump
> > > > up
> > > > > > the
> > > > > > > memory, it certainly seems to hum along better, but if
Session
> is
> > > > > > keeping a
> > > > > > > lot around, then this will have limits - there is no way
to
> "clear"
> > > > the
> > > > > > > session ?
> > > > > > >
> > > > > > > Perhaps I will explain what I am using JCR for (feel free
to
> smack
> > > > me
> > > > > > down
> > > > > > > if this is not what JCR and Jackrabbit are ever indended
for):
> > > > > > > I am storing "atomic business rules" (which means each
node is
> a
> > > > small
> > > > > > > single business rule). The data on each node is very small.
> These
> > > > nodes
> > > > > > are
> > > > > > > stored flat as child nodes under a top level node. To give
> structure
> > > > > > > (categorisation) for the users, I have references to these
> nodes all
> > > > > > over
> > > > > > > the place so people can navigate them all sorts of different
> ways
> > > > (as
> > > > > > there
> > > > > > > is no one clear hierarchy at the time the rules are created).
> JCR
> > > > gives
> > > > > > me
> > > > > > > most of what I need,  but as these rule nodes can number
in
> the
> > > > > > thousands
> > > > > > > (4000 is not uncommon for a reasonably complex business
unit),
> > > > then  I
> > > > > > am
> > > > > > > worried that  this just can't work.
> > > > > > >
> > > > > > > I have seen from past posts that people put nodes under
> different
> > > > > > parents
> > > > > > > (so there is no great number of child nodes) so that is
one
> option,
> > > > but
> > > > > > my
> > > > > > > gut feel is that its the WeakIdentityCollection: this well
> meaning
> > > > code
> > > > > > > means that the GC has to due a huge amount of work at the
> worst
> > > > possible
> > > > > > > time (when under stress). I am sure most of the time this
is
> not an
> > > > > > issue.
> > > > > > >
> > > > > > > Any ideas/tips/gotchas for a newbie? I would really like
to be
> > > > confident
> > > > > > > that I can scale up enough (its modest) with JCR for this
> purpose.
> > > > > > >
> > > > > > > On 8/31/06, Nicolas <ntoper@gmail.com> wrote:
> > > > > > > >
> > > > > > > > 2 more ideas:
> > > > > > > >
> > > > > > > > 1/ Did you try using a memory profiler so we can know
what
> is
> > > > wrong?
> > > > > > > >
> > > > > > > > 2/ What happens if you logout after say 100 updates?
> > > > > > > >
> > > > > > > >
> > > > > > > > a+
> > > > > > > > Nico
> > > > > > > > my blog! http://www.deviant-abstraction.net !!
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message