jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sridhar Raman" <sridhar.ra...@gmail.com>
Subject Re: Saving of nodes takes too long/Indexing configuration
Date Tue, 17 Jul 2007 13:07:06 GMT
Ok, this is the kind of node structure that I have.  BOOK ONE, as show in
the example, is the basic unit.  It can have multiple Property nodes, and
each Property node has exactly one Property Value node.
<sv:node sv:name="BOOK ONE">
        <sv:property sv:name="jcr:primaryType" sv:type="Name">
            <sv:value>sr:BookType</sv:value>
        </sv:property>
        <sv:property sv:name="sr:lastModifiedOn" sv:type="Date">
            <sv:value>2007-07-06T15:24:41.125+05:30</sv:value>
        </sv:property>
        <sv:property sv:name="sr:data" sv:type="String">
            <sv:value/>
        </sv:property>
        <sv:property sv:name="sr:dimvals" sv:type="Reference">
            <sv:value>5480a736-ec58-459e-b796-4faf6be581a9</sv:value>
            <sv:value>b8c478c6-c395-4f7e-b049-91724bd35324</sv:value>
        </sv:property>
        <sv:property sv:name="sr:state" sv:type="String">
            <sv:value>success</sv:value>
        </sv:property>
        <sv:property sv:name="sr:cats" sv:type="Reference">
            <sv:value>6ee74207-39a1-475e-94a7-a781564a8a0f</sv:value>
        </sv:property>
        <sv:property sv:name="sr:dateCreated" sv:type="Date">
            <sv:value>2007-07-06T15:24:41.125+05:30</sv:value>
        </sv:property>
        <sv:property sv:name="sr:brank" sv:type="Double">
            <sv:value>1.0</sv:value>
        </sv:property>
        <sv:property sv:name="sr:ownerName" sv:type="String">
            <sv:value>g</sv:value>
        </sv:property>
        <sv:property sv:name="sr:title" sv:type="String">
            <sv:value>Book One</sv:value>
        </sv:property>
        <sv:property sv:name="sr:error" sv:type="String">
            <sv:value>none</sv:value>
        </sv:property>
        <sv:property sv:name="sr:pfxtitle" sv:type="String">
            <sv:value>T2945A Book One</sv:value>
        </sv:property>
        <sv:property sv:name="sr:url" sv:type="String">
            <sv:value/>
        </sv:property>
        <sv:property sv:name="sr:lastModifiedBy" sv:type="String">
            <sv:value>g</sv:value>
        </sv:property>
        <sv:property sv:name="sr:id" sv:type="Long">
            <sv:value>1</sv:value>
        </sv:property>
        <sv:node sv:name="sr:property">
            <sv:property sv:name="jcr:primaryType" sv:type="Name">
                <sv:value>sr:PropofBook</sv:value>
            </sv:property>
            <sv:property sv:name="sr:name" sv:type="String">
                <sv:value>Book Property</sv:value>
            </sv:property>
            <sv:property sv:name="sr:property" sv:type="Reference">
                <sv:value>1c78697c-068c-4743-9933-eea91c90097c</sv:value>
            </sv:property>
            <sv:property sv:name="sr:type" sv:type="String">
                <sv:value>unrestricted</sv:value>
            </sv:property>
            <sv:node sv:name="sr:propvalname">
                <sv:property sv:name="jcr:primaryType" sv:type="Name">
                    <sv:value>sr:BookPropValueType</sv:value>
                </sv:property>
                <sv:property sv:name="sr:name" sv:type="String">
                    <sv:value>The Shining</sv:value>
                </sv:property>
            </sv:node>
        </sv:node>
</sv:node>

The total number of such Book nodes = 4105.  The total number of Property
nodes = 11006, and hence there will be an equal 11006 property value nodes.
That makes it a total of 26117 nodes that will be saved on the session.save
() execution.  The time taken for this save step alone is 13.38 mins.  Is
this expected and normal?  Or is there some other problem?

Oh, I have moved to a bundle Derby Persistence Manager.  That is giving me
this 13.38 mins time.  Earlier, when it was not bundle, the time used to be
32.35 mins.  I am very happy about this decrease.  But I am still concerned
that it's taking so long.

Based on Stefan's calculations, it should have been only 26 * 3  = 78
seconds!

So any help?

On 7/16/07, Stefan Guggisberg <stefan.guggisberg@gmail.com> wrote:
>
> hi,
>
> On 7/16/07, Sridhar Raman <sridhar.raman@gmail.com> wrote:
> > Also, how do I switch to bundle persistence?  Currently, this is the
> > configuration in my workspace.xml file:
> >
> > >     <PersistenceManager class="
> > > org.apache.jackrabbit.core.state.db.DerbyPersistenceManager">
> > >      <param name="url" value="jdbc:derby:${wsp.home}/db;create=true"/>
> > >      <param name="schemaObjectPrefix" value="${wsp.name}_"/>
> > >     </PersistenceManager>
> > >
> >
> > How do I change it to include the bundle persistance for Derby?
>
> while switching to BundleDbPersistenceManager would certainly
> provide a certain performance gain i doubt that it would solve your
> issue. you're using an embedded derby db which should provide
> a decent perfomance. i just ran a quick test using
> DerbyPersistenceManager:
> saving 1000 nodes with 5 string properties each takes
> about 3 seconds on a 1.9ghz intel macbook pro (i.e. ~12s./4000 nodes).
>
> you mentioned that in your case it takes ~32 minutes (!) to save 4000
> nodes.
> please tell us more on your data model. are you storing large binary
> properties?
> how many properties (and of what type) are you storing per node?
> can you provide a simple test case?
>
> cheers
> stefan
>
> >
> > Thanks,
> > Sridhar
> >
> > On 7/16/07, Sridhar Raman <sridhar.raman@gmail.com> wrote:
> > >
> > > I use DerbyPersistenceManager and LocalFileSystem.  So would I be able
> to
> > > switch to bundle persistence in this case, and would it be helpful?
> > >
> > > On 7/15/07, Jukka Zitting <jukka.zitting@gmail.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On 7/14/07, Sridhar Raman <sridhar.raman@gmail.com> wrote:
> > > > > I use Jackrabbit extensively, and one problem that I seem to run
> into
> > > > a lot
> > > > > of times is when I import data, and save the nodes.  For saving
> 4000
> > > > nodes,
> > > > > it almost takes 32 mins to execute the session.save()
> command.  Any
> > > > way of
> > > > > fixing this?
> > > > >
> > > > > Is it probably because all my data is getting indexed?  Could I
> > > > somehow
> > > > > specify only specific properties/types to be indexed?
> > > >
> > > > I much more suspect that the time is spent talking to the
> persistence
> > > > store. Are you using an external database for persistence?
> > > >
> > > > The traditional database persistence managers issue a separate SQL
> > > > statement (causing a network roundtrip to the database) for each
> node
> > > > *and* property being saved, which can quickly end up taking a lot of
> > > > time especially if the network roundtrip to a database server takes
> > > > more than a few milliseconds.
> > > >
> > > > Good solutions to this problem are either to switch to the bundle
> > > > persistence (which uses just a single statement for a node and all
> > > > it's properties) included in Jackrabbit 1.3 and/or using an embedded
> > > > database like the default Derby.
> > > >
> > > > BR,
> > > >
> > > > Jukka Zitting
> > > >
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message