jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Guggisberg" <stefan.guggisb...@gmail.com>
Subject Re: importxml memory
Date Thu, 23 Aug 2007 07:51:41 GMT
On 8/23/07, Michael Neale <michael.neale@gmail.com> wrote:
> Stefan - the exported XML won't actually contain the versions will it? ie if
> you export/import you lose your versions (assume you import to a fresh
> repo).

yes, that's correct.

>
> If that is the case - how does this effect the exported data?

on import of a versionable node, version history etc nodes will automatically
be created in the version store. this negatively affects the performance of
the import.

cheers
stefan

>
> (am also interested if it is possible to export versions).
>
> Michael
>
> On 8/23/07, Stefan Guggisberg <stefan.guggisberg@gmail.com> wrote:
> >
> > hi steve
> >
> > On 7/30/07, Steven Singer <Steven.Singer@radintl.com> wrote:
> > >
> > > How are people using importxml to restore or import anything but small
> > > amounts of data into the repository? I have a 22meg xml file that I'm
> > > unable to import because I keep running out of memory.
> >
> > i analyzed the xml file that you sent me offline (thanks!).
> > i noticed the following:
> >
> > 1) system view xml export
> > 2) file size: 22mb without whitespace,
> >     => 650mb with simple 2-space indentation (!)
> > 3) 23k nodes and 202k properties
> > 4) virtually every node is versionable
> > 5) *very* deep structure: max depth is 2340... (!)
> > 6) lots of junk data (e.g. thousands of _delete_me1234567890 nodes,
> >     btw hundreds/thousands of levels deep and all versionable)
> >
> > i'd say that the content model has lots of room for improvement ;)
> >
> > mainly 5) accounts for the excessive memory consumption during
> > import. while this could certainly be improved in jackrabbit i can't think
> > of a
> > really good use case for creating >2k level deep hierarchies.
> >
> > i also would suggest to review the use of mix:versionable. versionability
> > doesn't come for free since it implies a certain overhead. making 1 node
> > mix:versionable creates approx. 7 nodes and 13 properties in the version
> > store
> > (version history, root version etc etc). mix:versionable should therefore
> > only
> > be used where needed.
> >
> > btw: by using a decorated content handler which performed a save every
> > 200 nodes i was able to import the data with 512mb heap. it took about
> > 30 minutes on a macbook pro (2ghz).
> >
> > cheers
> > stefan
> >
> > >
> > > The importxml in in JCR commands works fine but when I go to save the
> > data
> > > the jvm memory usage goes up to 1GB and eventually runs out of memory.
> > > This was sort of discussed
> > >
> > http://mail-archives.apache.org/mod_mbox/jackrabbit-users/200610.mbox/browser
> > > but I didn't see any solutions proposed.
> > >
> > > Does the backup tool suffer from the same problem (being unable to
> > restore
> > > content above a certain size?)  How have other people handled migrating
> > > data between different persistence managers or changing a node-type
> > > definition that seems to require a re-import?
> > >
> > >
> > >
> > >
> > > Steven Singer
> > > RAD International Ltd.
> > >
> > >
> >
>

Mime
View raw message