jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Neale" <michael.ne...@gmail.com>
Subject Re: importxml memory
Date Thu, 23 Aug 2007 04:38:15 GMT
Stefan - the exported XML won't actually contain the versions will it? ie if
you export/import you lose your versions (assume you import to a fresh
repo).

If that is the case - how does this effect the exported data?

(am also interested if it is possible to export versions).

Michael

On 8/23/07, Stefan Guggisberg <stefan.guggisberg@gmail.com> wrote:
>
> hi steve
>
> On 7/30/07, Steven Singer <Steven.Singer@radintl.com> wrote:
> >
> > How are people using importxml to restore or import anything but small
> > amounts of data into the repository? I have a 22meg xml file that I'm
> > unable to import because I keep running out of memory.
>
> i analyzed the xml file that you sent me offline (thanks!).
> i noticed the following:
>
> 1) system view xml export
> 2) file size: 22mb without whitespace,
>     => 650mb with simple 2-space indentation (!)
> 3) 23k nodes and 202k properties
> 4) virtually every node is versionable
> 5) *very* deep structure: max depth is 2340... (!)
> 6) lots of junk data (e.g. thousands of _delete_me1234567890 nodes,
>     btw hundreds/thousands of levels deep and all versionable)
>
> i'd say that the content model has lots of room for improvement ;)
>
> mainly 5) accounts for the excessive memory consumption during
> import. while this could certainly be improved in jackrabbit i can't think
> of a
> really good use case for creating >2k level deep hierarchies.
>
> i also would suggest to review the use of mix:versionable. versionability
> doesn't come for free since it implies a certain overhead. making 1 node
> mix:versionable creates approx. 7 nodes and 13 properties in the version
> store
> (version history, root version etc etc). mix:versionable should therefore
> only
> be used where needed.
>
> btw: by using a decorated content handler which performed a save every
> 200 nodes i was able to import the data with 512mb heap. it took about
> 30 minutes on a macbook pro (2ghz).
>
> cheers
> stefan
>
> >
> > The importxml in in JCR commands works fine but when I go to save the
> data
> > the jvm memory usage goes up to 1GB and eventually runs out of memory.
> > This was sort of discussed
> >
> http://mail-archives.apache.org/mod_mbox/jackrabbit-users/200610.mbox/browser
> > but I didn't see any solutions proposed.
> >
> > Does the backup tool suffer from the same problem (being unable to
> restore
> > content above a certain size?)  How have other people handled migrating
> > data between different persistence managers or changing a node-type
> > definition that seems to require a re-import?
> >
> >
> >
> >
> > Steven Singer
> > RAD International Ltd.
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message