jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Guggisberg <stefan.guggisb...@gmail.com>
Subject Re: Multirow update/insert/delete issue
Date Fri, 12 Nov 2004 11:45:18 GMT
On Fri, 12 Nov 2004 07:29:05 +0100, Wolfgang Gehner
<wgehner@infonoia.com> wrote:
> Maybe we talk about the same thing in different ways?
> 
> So you do
> 
> dbtransaction.begin()
> insert ... (one row)
> dbtransaction.commit()
> dbtransaction.begin()
> insert ... (one row)
> dbtransaction.commit()
> 
> a thousand times?
> 
> We want to do
> dbtransaction.begin()
> insert .. (one row)
> insert .. (one row)
> insert .. (one row)
> etc..
> dbtransaction.commit()
> ... which I hope you will concede would me more efficient, and
> where we can do a thousand in no time at all, pretty much no matter what the
> underlying database. BTW, what's your configuration?

i tested with hsqldb, auto-commit turned on.

> 
> Of cource a user might also *want* to ensure that either all operations
> succeed or none.

the transaction support currently in jackrabbit does not depend on
a persistence manager being transactional.

> 
> ...and we wonder how we can realize this observing the current
> PersistenceMgr api, and thought you might have an idea. A
> persistenceMgr.store(nodesToUpdate, nodesToInsert, nodesToDelete) would be
> useful for us, but we were also thinking of consuming a save() event so we
> know when to commit.
> 

we are thinking of changing the persistence manager interface to 
enable/help implementors using as much of jackrabbit's code as possible
on top of their own persistence data model. this has a lot of implications
and requires a partial redesign of the current implementation (e.g. the
transaction support is affected), not just adding a bulk persist method
the the PersistenceManager interface. 

the ease of adapting arbitrary legacy data models hasn't been a design goal
when i started the implementation but i agree that it is certainly a good thing
(as long as it doesn't compromise/limit jackrabbit's current functionality).

there will probably a method similar to one you suggested and i'll keep you 
posted on the progress of the redesign.  is this ok with you?

cheers
stefan

> 
> 
> 
> Wolfgang
> 
> ----- Original Message -----
> From: "Stefan Guggisberg" <stefan.guggisberg@gmail.com>
> To: <jackrabbit-dev@incubator.apache.org>
> Sent: Thursday, November 11, 2004 6:36 PM
> Subject: Re: Multirow update/insert/delete issue
> 
> > On Thu, 11 Nov 2004 12:32:27 +0100, Wolfgang Gehner
> > <wgehner@infonoia.com> wrote:
> > > We're fully aware of the good benchmarks when not using LocalFileSystem.
> > > "3. Object with LocalFileSystem, not surprisingly either, showed the
> worst
> > >    performance: ca. 30 sec./1000 nodes"
> > >
> > > So there is no criticism implied or intended whatsoever.
> > > I've just taken the analogy that writing to a db is like writing a
> thousand
> > > files *when it's done one by one*.
> >
> > sorry, i still don't buy this. the jdbc based persistence manager i hacked
> > together is just doing that: if 1000 nodes are added and saved in one
> call,
> > it is inserting 1000 node records plus 1000 property records *one by one*.
> > i ran the test and it averaged at 3 - 3.5 sec./1000 nodes. in fact it came
> > close to the best results that i got with the b-tree based persistence
> > managers.
> >
> >
> > >
> > > We are new to the Jackrabbit api and wonder how we can wrap multiple
> node
> > > writes/inserts/or deletes in one db transaction with the current
> > > PersistenceMgr API. When we can do that, performance will be no issue.
> We
> > > might have PersistentMgr listen to an event emitted by node.save(), and
> > > persist only then? What do you think?
> >
> > the bad performance you are experiencing is imo not caused by the data
> > model of your underlying persistence layer, not by the current
> implementation
> > of jackrabbit. if you send me the schema that you are using for
> > persisting nodes and properties in a rdbms, i will have a look at it.
> >
> > >
> > > Would you like to look at our code as is?
> >
> > sure.
> >
> > regards
> > stefan
> >
> > >
> > > Stefan, we look forward to your recommendation.
> > >
> > > Best regards,
> > >
> > > Wolfgang
> > >
> > >
> > >
> > > ----- Original Message -----
> > > From: "Stefan Guggisberg" <stefan.guggisberg@gmail.com>
> > > To: <jackrabbit-dev@incubator.apache.org>
> > > Sent: Wednesday, November 10, 2004 6:36 PM
> > > Subject: Re: Multirow update/insert/delete issue
> > >
> > > > a few comments/clarifcations inline...
> > > >
> > > > On Wed, 10 Nov 2004 17:41:46 +0100, Wolfgang Gehner
> > > > <wgehner@infonoia.com> wrote:
> > > > >
> > > > > As discussed with David offline, when 1000 nodes are inserted, in
> the
> > > current implementation the PersistenceMgr.store() method
> > > > > is called a 1000 times. So the XMLPersistenceMgr takes 30 seconds
to
> do
> > > those 1000 write operations.
> > > >
> > > > not quite correct: i said that the XML/ObjectPersistenceManager in
> > > > combination on a CQFileSystem takes ca. 5 sec. for adding and saving
> > > > 1000 nodes (that's
> > > > 2000 write operations, 1000 nodes + 1000 properties).
> > > >
> > > > > A JDBC implementation of the current PersistenceMgr API is
> "condemned"
> > > to do the same thing. We'd really look to a way to bundle those 1000
> writes
> > > into one "transaction", so we can take 2-3 seconds on a relational
> database
> > > rather than 30.
> > > >
> > > > again, a jdbc implementation is *not* condemned to take 30 sec.!
> > > > i hacked a quick&dirty implementation of a jdbc persistence manager
> (with
> > > a very
> > > > *primitive* schema) that took less than 5 sec. for adding and saving
> 1000
> > > nodes.
> > > >
> > > > >
> > > > > So we'd like to throw into the discussion the following thoughts:
> > > > > - how about maintaining an instance of of PersistenceMgr (pm) not
on
> > > (Persistent)NodeState but on NodeImpl
> > > > > - the implementation of node.save() to collect info what nodes incl.
> > > children to save and call a persistenceMgr.store(
> > > > > nodesToUpdate, nodesToInsert, nodesToDelete) just once. That way
the
> pm
> > > could bundle operations in line with the
> > > > > repository requirements.
> > > > >
> > > > > This would make Jackrabbit's persistence model follow the DAO (data
> > > access object) pattern as we understand it.
> > > > >
> > > > > Would be pleased to elaborate and discuss. And share our JDBC
> > > PersistenceMgr prototype with anyone interested (it passes the current
> api
> > > unit test, but has a very non-optimized ER design and is inflicted with
> the
> > > issue discussed in this message).
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Infonoia S.A.
> > > > > rue de Berne 7
> > > > > 1201 Geneva
> > > > > Tel: +41 22 9000 009
> > > > > Fax: +41 22 9000 018
> > > > > wgehner@infonoia.com
> > > > > http://www.infonoia.com
> > > > >
> > >
> > >
> 
>

Mime
View raw message