jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Edwards" <si...@gx.nl>
Subject Workspace.importXML() and Memory
Date Fri, 28 Apr 2006 12:52:51 GMT


I've been testing out Jackrabbit this week, mostly trying to find out
how scalable it is. One thing I tried doing was importing a a 45Mb XML
file containing about 300,000 XML nodes.

Using Session.importXML() blew the JVM's heap up of course, as it tried
to read everything before writing it to the DB. So I tried
Workspace.importXML() expecting it to write the nodes directly through
to the DB. Unfortunately it seems to act like Session.importXML() and
try to read everything in first before writing to the DB. Of course this
also blew up the JVM (-X512m).

Now, my question is; is this the correct behaviour for
Workspace.importXML() to cache everything first in memory before
writing? and secondly, is there a standard way stream XML into the DB?

(The JSR-170 spec isn't very clear on this issue. It claims that
Workspace.importXML() has an advantage over Session.importXML() in that
is doesn't store pending nodes in your session before writing. But
Jackrabbit's Workspace seems to act the same way as Session.)

thanks in advanced,

Simon Edwards
<GX> creative online development B.V.

t: 024 - 3888 261
f: 024 - 3888 621
e: simon@gx.nl <mailto:simon@gx.nl> 

View raw message