jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Guggisberg" <stefan.guggisb...@gmail.com>
Subject Re: importxml memory
Date Wed, 22 Aug 2007 15:56:52 GMT
hi steve

On 7/30/07, Steven Singer <Steven.Singer@radintl.com> wrote:
> How are people using importxml to restore or import anything but small
> amounts of data into the repository? I have a 22meg xml file that I'm
> unable to import because I keep running out of memory.

i analyzed the xml file that you sent me offline (thanks!).
i noticed the following:

1) system view xml export
2) file size: 22mb without whitespace,
    => 650mb with simple 2-space indentation (!)
3) 23k nodes and 202k properties
4) virtually every node is versionable
5) *very* deep structure: max depth is 2340... (!)
6) lots of junk data (e.g. thousands of _delete_me1234567890 nodes,
    btw hundreds/thousands of levels deep and all versionable)

i'd say that the content model has lots of room for improvement ;)

mainly 5) accounts for the excessive memory consumption during
import. while this could certainly be improved in jackrabbit i can't think of a
really good use case for creating >2k level deep hierarchies.

i also would suggest to review the use of mix:versionable. versionability
doesn't come for free since it implies a certain overhead. making 1 node
mix:versionable creates approx. 7 nodes and 13 properties in the version store
(version history, root version etc etc). mix:versionable should therefore only
be used where needed.

btw: by using a decorated content handler which performed a save every
200 nodes i was able to import the data with 512mb heap. it took about
30 minutes on a macbook pro (2ghz).


> The importxml in in JCR commands works fine but when I go to save the data
> the jvm memory usage goes up to 1GB and eventually runs out of memory.
> This was sort of discussed
> http://mail-archives.apache.org/mod_mbox/jackrabbit-users/200610.mbox/browser
> but I didn't see any solutions proposed.
> Does the backup tool suffer from the same problem (being unable to restore
> content above a certain size?)  How have other people handled migrating
> data between different persistence managers or changing a node-type
> definition that seems to require a re-import?
> Steven Singer
> RAD International Ltd.

View raw message