jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From chewy_fruit_loop <chewy_fruit_l...@yahoo.com>
Subject Re: persistance
Date Fri, 14 Sep 2007 09:38:28 GMT

Stefan you're spot on there.
Thanks a million, I've only had to remove a 
  root.update(<workspaceName>)
from my save code and that worked.

and the code for getting a ContentHandler now looks like 

  ContentHandler cHandler = session.getImportContentHandler("/",
ImportUUIDBehavior.IMPORT_UUID_CREATE_NEW);

where it used to be 
  
  ContentHandler cHandler =
session.getWorkspace().getImportContentHandler("/",
ImportUUIDBehavior.IMPORT_UUID_CREATE_NEW); 


hopefully the next unfortunate soul to come across this little doosie will
find this post and save themselves a bunch of greif


Thanks all :)


Stefan Guggisberg wrote:
> 
> On 9/14/07, Florent Guillaume <fg@nuxeo.com> wrote:
>> If you import that big a file, you should import directly into the
>> workspace and not in the session, without going through the transient
>> space and using lots of memory.
>> So use Workspace.getImportContentHandler or Workspace.importXML, not the
>> Session methods. Read the JSR-170 for the benefits.
> 
> that's absolutely correct, theoretically ;-) the workspace methods avoid
> the
> transient layer. however, in the current implementation of jackrabbit
> the workspace import methods are still memory-bound because the
> entire change log is kept in memory until commit on endDocument.
> 
> for very large imports i'd therefore suggest to use the session
> import methods, saving batches of e.g. 1000 items by using a
> ContentHandler decorator.
> 
> hope this helps.
> 
> cheers
> stefan
> 
>>
>> Florent
>>
>> chewy_fruit_loop wrote:
>> > I'm currently trying to import an XML file into a bog standard empty
>> > repository.
>> > The problem is the file is 72.5mb containing around 200,000 elements
>> (yes
>> > they are all required).  This is currently taking about 90 mins (give
>> or
>> > take) to get into derby, and thats with indexing off.
>> >
>> > The time wouldn't be such an issue if it didn't use 1.7Gb of RAM.
>> > I've decorated a ContentHandler so it calls :
>> >
>> > root.update(<workspace name>)
>> > root.save()
>> >
>> > where root is the root node from the tree.
>> > This is being called after every 500 start elements.  The save just
>> doesn't
>> > seem to flush the contents that have been parsed to the persistent
>> store.
>> > This is the same if I use derby or Oracle as storage.  The only time
>> things
>> > seem to start to be persisted is when the endDocument is hit.
>> >
>> > have I missed something blindingly obvious here?  I really don't mind
>> > everyone having a bit of a chuckle at me, I just want to get this
>> sorted
>> > out.
>> >
>> >
>> > thanks
>> >
>>
>>
>> --
>> Florent Guillaume, Director of R&D, Nuxeo
>> Open Source Enterprise Content Management (ECM)
>> http://www.nuxeo.com   http://www.nuxeo.org   +33 1 40 33 79 87
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/persistance-tf4430069.html#a12671851
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.


Mime
View raw message