jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Guggisberg" <stefan.guggisb...@gmail.com>
Subject Re: Saving of nodes takes too long/Indexing configuration
Date Wed, 18 Jul 2007 10:36:40 GMT
On 7/17/07, Sridhar Raman <sridhar.raman@gmail.com> wrote:
> Ok, this is the kind of node structure that I have.  BOOK ONE, as show in
> the example, is the basic unit.  It can have multiple Property nodes, and
> each Property node has exactly one Property Value node.
> <sv:node sv:name="BOOK ONE">
>         <sv:property sv:name="jcr:primaryType" sv:type="Name">
>             <sv:value>sr:BookType</sv:value>
>         </sv:property>
>         <sv:property sv:name="sr:lastModifiedOn" sv:type="Date">
>             <sv:value>2007-07-06T15:24:41.125+05:30</sv:value>
>         </sv:property>
>         <sv:property sv:name="sr:data" sv:type="String">
>             <sv:value/>
>         </sv:property>
>         <sv:property sv:name="sr:dimvals" sv:type="Reference">
>             <sv:value>5480a736-ec58-459e-b796-4faf6be581a9</sv:value>
>             <sv:value>b8c478c6-c395-4f7e-b049-91724bd35324</sv:value>
>         </sv:property>
>         <sv:property sv:name="sr:state" sv:type="String">
>             <sv:value>success</sv:value>
>         </sv:property>
>         <sv:property sv:name="sr:cats" sv:type="Reference">
>             <sv:value>6ee74207-39a1-475e-94a7-a781564a8a0f</sv:value>
>         </sv:property>
>         <sv:property sv:name="sr:dateCreated" sv:type="Date">
>             <sv:value>2007-07-06T15:24:41.125+05:30</sv:value>
>         </sv:property>
>         <sv:property sv:name="sr:brank" sv:type="Double">
>             <sv:value>1.0</sv:value>
>         </sv:property>
>         <sv:property sv:name="sr:ownerName" sv:type="String">
>             <sv:value>g</sv:value>
>         </sv:property>
>         <sv:property sv:name="sr:title" sv:type="String">
>             <sv:value>Book One</sv:value>
>         </sv:property>
>         <sv:property sv:name="sr:error" sv:type="String">
>             <sv:value>none</sv:value>
>         </sv:property>
>         <sv:property sv:name="sr:pfxtitle" sv:type="String">
>             <sv:value>T2945A Book One</sv:value>
>         </sv:property>
>         <sv:property sv:name="sr:url" sv:type="String">
>             <sv:value/>
>         </sv:property>
>         <sv:property sv:name="sr:lastModifiedBy" sv:type="String">
>             <sv:value>g</sv:value>
>         </sv:property>
>         <sv:property sv:name="sr:id" sv:type="Long">
>             <sv:value>1</sv:value>
>         </sv:property>
>         <sv:node sv:name="sr:property">
>             <sv:property sv:name="jcr:primaryType" sv:type="Name">
>                 <sv:value>sr:PropofBook</sv:value>
>             </sv:property>
>             <sv:property sv:name="sr:name" sv:type="String">
>                 <sv:value>Book Property</sv:value>
>             </sv:property>
>             <sv:property sv:name="sr:property" sv:type="Reference">
>                 <sv:value>1c78697c-068c-4743-9933-eea91c90097c</sv:value>
>             </sv:property>
>             <sv:property sv:name="sr:type" sv:type="String">
>                 <sv:value>unrestricted</sv:value>
>             </sv:property>
>             <sv:node sv:name="sr:propvalname">
>                 <sv:property sv:name="jcr:primaryType" sv:type="Name">
>                     <sv:value>sr:BookPropValueType</sv:value>
>                 </sv:property>
>                 <sv:property sv:name="sr:name" sv:type="String">
>                     <sv:value>The Shining</sv:value>
>                 </sv:property>
>             </sv:node>
>         </sv:node>
> </sv:node>
>
> The total number of such Book nodes = 4105.  The total number of Property
> nodes = 11006, and hence there will be an equal 11006 property value nodes.
> That makes it a total of 26117 nodes that will be saved on the session.save

based on your sample i compute the following stats:

4105 'sr:BookType' nodes with 15 properties each
11006 'sr:PropofBook' nodes with 4 properties each
11006 'sr:BookPropValueType' nodes with 2 properties each

total:
26117 nodes
127611 properties

i.e. ~150k items

that's a pretty big change set for a save() operation and i guess
you could/should break it down into smaller sizes.

i also count 23321 REFERENCE properties. note that REFERENCE
properties come at a certain cost since ensuring referential integrity
is quite expensive. are you absolutely sure you need to use REFERENCE
properties? you might want to review your data model. please take a look
at david's excellent content modelling rules/recommendations recently
posted on the users list.

> () execution.  The time taken for this save step alone is 13.38 mins.  Is
> this expected and normal?  Or is there some other problem?
>
> Oh, I have moved to a bundle Derby Persistence Manager.  That is giving me
> this 13.38 mins time.  Earlier, when it was not bundle, the time used to be
> 32.35 mins.  I am very happy about this decrease.  But I am still concerned
> that it's taking so long.

how do you add the nodes? can you provide a simple test case or at
least a code snippet of the relevant processing? what are your jvm
heap size settings?

cheers
stefan

>
> Based on Stefan's calculations, it should have been only 26 * 3  = 78
> seconds!
>
> So any help?
>
> On 7/16/07, Stefan Guggisberg <stefan.guggisberg@gmail.com> wrote:
> >
> > hi,
> >
> > On 7/16/07, Sridhar Raman <sridhar.raman@gmail.com> wrote:
> > > Also, how do I switch to bundle persistence?  Currently, this is the
> > > configuration in my workspace.xml file:
> > >
> > > >     <PersistenceManager class="
> > > > org.apache.jackrabbit.core.state.db.DerbyPersistenceManager">
> > > >      <param name="url" value="jdbc:derby:${wsp.home}/db;create=true"/>
> > > >      <param name="schemaObjectPrefix" value="${wsp.name}_"/>
> > > >     </PersistenceManager>
> > > >
> > >
> > > How do I change it to include the bundle persistance for Derby?
> >
> > while switching to BundleDbPersistenceManager would certainly
> > provide a certain performance gain i doubt that it would solve your
> > issue. you're using an embedded derby db which should provide
> > a decent perfomance. i just ran a quick test using
> > DerbyPersistenceManager:
> > saving 1000 nodes with 5 string properties each takes
> > about 3 seconds on a 1.9ghz intel macbook pro (i.e. ~12s./4000 nodes).
> >
> > you mentioned that in your case it takes ~32 minutes (!) to save 4000
> > nodes.
> > please tell us more on your data model. are you storing large binary
> > properties?
> > how many properties (and of what type) are you storing per node?
> > can you provide a simple test case?
> >
> > cheers
> > stefan
> >
> > >
> > > Thanks,
> > > Sridhar
> > >
> > > On 7/16/07, Sridhar Raman <sridhar.raman@gmail.com> wrote:
> > > >
> > > > I use DerbyPersistenceManager and LocalFileSystem.  So would I be able
> > to
> > > > switch to bundle persistence in this case, and would it be helpful?
> > > >
> > > > On 7/15/07, Jukka Zitting <jukka.zitting@gmail.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On 7/14/07, Sridhar Raman <sridhar.raman@gmail.com> wrote:
> > > > > > I use Jackrabbit extensively, and one problem that I seem to
run
> > into
> > > > > a lot
> > > > > > of times is when I import data, and save the nodes.  For saving
> > 4000
> > > > > nodes,
> > > > > > it almost takes 32 mins to execute the session.save()
> > command.  Any
> > > > > way of
> > > > > > fixing this?
> > > > > >
> > > > > > Is it probably because all my data is getting indexed?  Could
I
> > > > > somehow
> > > > > > specify only specific properties/types to be indexed?
> > > > >
> > > > > I much more suspect that the time is spent talking to the
> > persistence
> > > > > store. Are you using an external database for persistence?
> > > > >
> > > > > The traditional database persistence managers issue a separate SQL
> > > > > statement (causing a network roundtrip to the database) for each
> > node
> > > > > *and* property being saved, which can quickly end up taking a lot
of
> > > > > time especially if the network roundtrip to a database server takes
> > > > > more than a few milliseconds.
> > > > >
> > > > > Good solutions to this problem are either to switch to the bundle
> > > > > persistence (which uses just a single statement for a node and all
> > > > > it's properties) included in Jackrabbit 1.3 and/or using an embedded
> > > > > database like the default Derby.
> > > > >
> > > > > BR,
> > > > >
> > > > > Jukka Zitting
> > > > >
> > > >
> > > >
> > >
> >
>

Mime
View raw message