jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Guggisberg" <stefan.guggisb...@gmail.com>
Subject Re: persistance
Date Fri, 14 Sep 2007 09:15:21 GMT
On 9/14/07, Florent Guillaume <fg@nuxeo.com> wrote:
> If you import that big a file, you should import directly into the
> workspace and not in the session, without going through the transient
> space and using lots of memory.
> So use Workspace.getImportContentHandler or Workspace.importXML, not the
> Session methods. Read the JSR-170 for the benefits.

that's absolutely correct, theoretically ;-) the workspace methods avoid the
transient layer. however, in the current implementation of jackrabbit
the workspace import methods are still memory-bound because the
entire change log is kept in memory until commit on endDocument.

for very large imports i'd therefore suggest to use the session
import methods, saving batches of e.g. 1000 items by using a
ContentHandler decorator.

hope this helps.

cheers
stefan

>
> Florent
>
> chewy_fruit_loop wrote:
> > I'm currently trying to import an XML file into a bog standard empty
> > repository.
> > The problem is the file is 72.5mb containing around 200,000 elements (yes
> > they are all required).  This is currently taking about 90 mins (give or
> > take) to get into derby, and thats with indexing off.
> >
> > The time wouldn't be such an issue if it didn't use 1.7Gb of RAM.
> > I've decorated a ContentHandler so it calls :
> >
> > root.update(<workspace name>)
> > root.save()
> >
> > where root is the root node from the tree.
> > This is being called after every 500 start elements.  The save just doesn't
> > seem to flush the contents that have been parsed to the persistent store.
> > This is the same if I use derby or Oracle as storage.  The only time things
> > seem to start to be persisted is when the endDocument is hit.
> >
> > have I missed something blindingly obvious here?  I really don't mind
> > everyone having a bit of a chuckle at me, I just want to get this sorted
> > out.
> >
> >
> > thanks
> >
>
>
> --
> Florent Guillaume, Director of R&D, Nuxeo
> Open Source Enterprise Content Management (ECM)
> http://www.nuxeo.com   http://www.nuxeo.org   +33 1 40 33 79 87
>
>

Mime
View raw message