jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Felix Meschberger" <Felix.Meschber...@day.com>
Subject Re: atomic vs group node creation/storage
Date Wed, 20 Jun 2007 08:43:49 GMT
Hi Frédéric,

Now this makes a whole lot more sense to me :-)

The first algorithm creates a number of nodes and properties in transient
space, which is currently kept in memory. The higher the number of nodes,
the higher the memory consumption. The second algorithm just creates a
single node and its properties in the transient space before saving them
away and releasing used memory (or at least making it available for GC).

This is currently an issue of the implementation of the transient space.
Stefan might have more elaborate details. For the time being, you should
probably go with the "node by node save" algorithm.

Hope this helps.

Regards
Felix

PS: In your initial post you seem to have switched algorithm descriptions
which caused some confusion :-)

On 6/20/07, Frédéric Esnault <fesn@legisway.com> wrote:
>
> Of course, here is the repository config :
>
> //////////////////////////////////////////////////
> // START REPOSITORY.XML//
> //////////////////////////////////////////////////
>
> <?xml version="1.0"?>
> <!DOCTYPE Repository PUBLIC "-//The Apache Software Foundation//DTD
> Jackrabbit 1.2//EN"
>                                 "
> http://jackrabbit.apache.org/dtd/repository-1.2.dtd">
>
> <Repository>
>         <FileSystem class="
> org.apache.jackrabbit.core.fs.local.LocalFileSystem">
>                 <param name="path" value="${rep.home}/repository"/>
>         </FileSystem>
>
>         <Security appName="Jackrabbit">
>
>                 <!--
>                         access manager:
>                         class: FQN of class implementing the AccessManager
> interface
>                 -->
>                 <AccessManager class="
> org.apache.jackrabbit.core.security.SimpleAccessManager">
>                         <!-- <param name="config" value="${rep.home}/access.xml"/>
> -->
>                 </AccessManager>
>
>                 <LoginModule class="
> org.apache.jackrabbit.core.security.SimpleLoginModule">
>                         <!-- anonymous user name ('anonymous' is the
> default value) -->
>                         <param name="anonymousId" value="anonymous"/>
>                         <!--
>                                 default user name to be used instead of
> the anonymous user
>                                 when no login credentials are provided
> (unset by default)
>                         -->
>                         <!-- <param name="defaultUserId"
> value="superuser"/> -->
>                 </LoginModule>
>
>         </Security>
>
>         <!--
>                 location of workspaces root directory and name of default
> workspace
>         -->
>         <Workspaces rootPath="${rep.home}/workspaces"
> defaultWorkspace="default"/>
>
>         <!--
>                 workspace configuration template:
>                 used to create the initial workspace if there's no
> workspace yet
>         -->
>         <Workspace name="${wsp.name}">
>
>                 <!--
>                         virtual file system of the workspace:
>                         class: FQN of class implementing the FileSystem
> interface
>                 -->
>                 <FileSystem class="
> org.apache.jackrabbit.core.fs.local.LocalFileSystem">
>                         <param name="path" value="${wsp.home}"/>
>                 </FileSystem>
>
>                 <!--
>                         persistence manager of the workspace:
>                         class: FQN of class implementing the
> PersistenceManager interface
>                 -->
>                 <PersistenceManager class="
> org.apache.jackrabbit.core.persistence.db.SimpleDbPersistenceManager">
>                         <param name="driver" value="com.mysql.jdbc.Driver
> "/>
>                         <param name="url"
> value="jdbc:mysql:///testJack?autoReconnect=true"/>
>                         <param name="schema" value="mysql"/>
>                         <param name="schemaObjectPrefix" value="${wsp.name
> }_"/>
>                         <param name="externalBLOBs" value="false"/>
>                         <param name="user" value="root"/>
>                         <param name="password" value="password"/>
>                 </PersistenceManager>
>
>                 <!--
>                         Search index and the file system it uses.
>                         class: FQN of class implementing the QueryHandler
> interface
>                 -->
>                 <SearchIndex class="
> org.apache.jackrabbit.core.query.lucene.SearchIndex">
>                         <param name="path" value="${wsp.home}/index"/>
>                 </SearchIndex>
>
>         </Workspace>
>
>         <!--
>                 Configures the versioning
>         -->
>         <Versioning rootPath="${rep.home}/version">
>
>                 <!--
>                         Configures the filesystem to use for versioning
> for the respective
>                         persistence manager
>                 -->
>                 <FileSystem class="
> org.apache.jackrabbit.core.fs.local.LocalFileSystem">
>                         <param name="path" value="${rep.home}/version"/>
>                 </FileSystem>
>
>                 <!--
>                         Configures the persistence manager to be used for
> persisting version state.
>                         Please note that the current versioning
> implementation is based on
>                         a 'normal' persistence manager, but this could
> change in future
>                         implementations.
>                 -->
>
>                 <PersistenceManager class="
> org.apache.jackrabbit.core.persistence.db.SimpleDbPersistenceManager">
>                         <param name="driver" value="com.mysql.jdbc.Driver
> "/>
>                         <param name="url"
> value="jdbc:mysql:///testJackVer?autoReconnect=true"/>
>                         <param name="schema" value="mysql"/>
>                         <param name="schemaObjectPrefix"
> value="version_"/>
>                         <param name="externalBLOBs" value="false"/>
>                         <param name="user" value="root"/>
>                         <param name="password" value="password"/>
>                 </PersistenceManager>
>         </Versioning>
>
>         <!--
>                 Search index for content that is shared repository wide
>                 (/jcr:system tree, contains mainly versions)
>         -->
>         <SearchIndex class="
> org.apache.jackrabbit.core.query.lucene.SearchIndex">
>                 <param name="path" value="${rep.home}/repository/index"/>
>         </SearchIndex>
>
> </Repository>
>
> ///////////////////////////////////////////////
> // END  REPOSITORY.XML//
> //////////////////////////////////////////////
>
> And the code doing the creation, I give you the two algortihm
> implementations :
>
>
> /////////////////////////////////////////////////////////////////
> // FIRST ALGORITHM : Node by Node//
> ////////////////////////////////////////////////////////////////
>
> Node contractors = (Node) session.getItem("/lgw:root/lgw:contractors");
> int count = number_of_nodes; // whatever,  put the number of nodes to
> create
> for (int i = 0; i < count; i++) {
>         Node contractor = contractors.addNode("lgw:contractor");
>         initializeContractor(session, contractor);
>         created++;
> }
> session.save();
>
> ////////////////////////////////////////////////
> // END FIRST ALGORITHM //
> ////////////////////////////////////////////////
>
> /////////////////////////////////////////////////////////////////////
> // SECOND ALGORITHM : Node by Node//
> /////////////////////////////////////////////////////////////////////
>
> Node contractors = (Node) session.getItem("/lgw:root/lgw:contractors");
> int count = number_of_nodes; // whatever,  put the number of nodes to
> create
> for (int i = 0; i < count; i++) {
>         Node contractor = contractors.addNode("lgw:contractor");
>         initializeContractor(session, contractor);
>         created++;
>         session.save();
> }
>
> /////////////////////////////////////////////////////
> // END SECOND ALGORITHM //
> ////////////////////////////////////////////////////
>
>
>
> Frédéric Esnault - Ingénieur R&D
>
>
> -----Message d'origine-----
> De: Thomas Mueller [mailto:thomas.tom.mueller@gmail.com]
> Envoyé: mercredi 20 juin 2007 09:51
> À: dev@jackrabbit.apache.org
> Objet: Re: atomic vs group node creation/storage
>
> Hi,
>
> Could you send the configuration (repository.xml file), and the code
> if possible (so I don't have to write it again). Just recently I
> though I saw a similar problem, but I am not sure if it's related.
>
> Thanks,
> Thomas
>
>
> On 6/20/07, Frédéric Esnault <fesn@legisway.com> wrote:
> > Hello there !
> >
> >
> >
> > It seems to me that there is a storage problem, when you create a lot of
> nodes, one by one, using this algorithm :
> >
> > 1.      for each node to create
> >
> >         a.      create node
> >         b.      fill node properties/child nodes
> >         c.      save session
> >
> > 2.      end for
> >
> >
> >
> > The default_node and default_prop tables number of rows (and size)
> increases very fast, and in an unacceptable way.
> >
> > I had a 35 million default_node table after inserting like this 27 000
> nodes in a repository.
> >
> >
> >
> > Then I used the other algorithm :
> >
> > 1.      for each node to create
> >
> >         a.      create node
> >         b.      fill node properties/child nodes
> >
> > 2.      end for
> > 3.      save session
> >
> >
> >
> > And this gives a much better situation (currently I have a 36 000
> content repository, and my tables are correct - 60 000 rows for node table,
> >
> > 576 000 rows for properties).
> >
> >
> >
> > The problem here is that in a production environment, users are going to
> create their nodes one by one, day after day, never by full blocks.
> >
> > So is there a storage problem ?
> >
> >
> >
> > Frederic Esnault
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message