jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wolfgang Gehner" <wgeh...@infonoia.com>
Subject Re: Multirow update/insert/delete issue
Date Fri, 12 Nov 2004 12:15:30 GMT
That's great!

Best regards,

Wolfgang

----- Original Message ----- 
From: "Stefan Guggisberg" <stefan.guggisberg@gmail.com>
To: <jackrabbit-dev@incubator.apache.org>
Sent: Friday, November 12, 2004 12:45 PM
Subject: Re: Multirow update/insert/delete issue


> On Fri, 12 Nov 2004 07:29:05 +0100, Wolfgang Gehner
> <wgehner@infonoia.com> wrote:
> > Maybe we talk about the same thing in different ways?
> >
> > So you do
> >
> > dbtransaction.begin()
> > insert ... (one row)
> > dbtransaction.commit()
> > dbtransaction.begin()
> > insert ... (one row)
> > dbtransaction.commit()
> >
> > a thousand times?
> >
> > We want to do
> > dbtransaction.begin()
> > insert .. (one row)
> > insert .. (one row)
> > insert .. (one row)
> > etc..
> > dbtransaction.commit()
> > ... which I hope you will concede would me more efficient, and
> > where we can do a thousand in no time at all, pretty much no matter what
the
> > underlying database. BTW, what's your configuration?
>
> i tested with hsqldb, auto-commit turned on.
>
> >
> > Of cource a user might also *want* to ensure that either all operations
> > succeed or none.
>
> the transaction support currently in jackrabbit does not depend on
> a persistence manager being transactional.
>
> >
> > ...and we wonder how we can realize this observing the current
> > PersistenceMgr api, and thought you might have an idea. A
> > persistenceMgr.store(nodesToUpdate, nodesToInsert, nodesToDelete) would
be
> > useful for us, but we were also thinking of consuming a save() event so
we
> > know when to commit.
> >
>
> we are thinking of changing the persistence manager interface to
> enable/help implementors using as much of jackrabbit's code as possible
> on top of their own persistence data model. this has a lot of implications
> and requires a partial redesign of the current implementation (e.g. the
> transaction support is affected), not just adding a bulk persist method
> the the PersistenceManager interface.
>
> the ease of adapting arbitrary legacy data models hasn't been a design
goal
> when i started the implementation but i agree that it is certainly a good
thing
> (as long as it doesn't compromise/limit jackrabbit's current
functionality).
>
> there will probably a method similar to one you suggested and i'll keep
you
> posted on the progress of the redesign.  is this ok with you?
>
> cheers
> stefan
>
> >
> >
> >
> > Wolfgang
> >
> > ----- Original Message -----
> > From: "Stefan Guggisberg" <stefan.guggisberg@gmail.com>
> > To: <jackrabbit-dev@incubator.apache.org>
> > Sent: Thursday, November 11, 2004 6:36 PM
> > Subject: Re: Multirow update/insert/delete issue
> >
> > > On Thu, 11 Nov 2004 12:32:27 +0100, Wolfgang Gehner
> > > <wgehner@infonoia.com> wrote:
> > > > We're fully aware of the good benchmarks when not using
LocalFileSystem.
> > > > "3. Object with LocalFileSystem, not surprisingly either, showed the
> > worst
> > > >    performance: ca. 30 sec./1000 nodes"
> > > >
> > > > So there is no criticism implied or intended whatsoever.
> > > > I've just taken the analogy that writing to a db is like writing a
> > thousand
> > > > files *when it's done one by one*.
> > >
> > > sorry, i still don't buy this. the jdbc based persistence manager i
hacked
> > > together is just doing that: if 1000 nodes are added and saved in one
> > call,
> > > it is inserting 1000 node records plus 1000 property records *one by
one*.
> > > i ran the test and it averaged at 3 - 3.5 sec./1000 nodes. in fact it
came
> > > close to the best results that i got with the b-tree based persistence
> > > managers.
> > >
> > >
> > > >
> > > > We are new to the Jackrabbit api and wonder how we can wrap multiple
> > node
> > > > writes/inserts/or deletes in one db transaction with the current
> > > > PersistenceMgr API. When we can do that, performance will be no
issue.
> > We
> > > > might have PersistentMgr listen to an event emitted by node.save(),
and
> > > > persist only then? What do you think?
> > >
> > > the bad performance you are experiencing is imo not caused by the data
> > > model of your underlying persistence layer, not by the current
> > implementation
> > > of jackrabbit. if you send me the schema that you are using for
> > > persisting nodes and properties in a rdbms, i will have a look at it.
> > >
> > > >
> > > > Would you like to look at our code as is?
> > >
> > > sure.
> > >
> > > regards
> > > stefan
> > >
> > > >
> > > > Stefan, we look forward to your recommendation.
> > > >
> > > > Best regards,
> > > >
> > > > Wolfgang
> > > >
> > > >
> > > >
> > > > ----- Original Message -----
> > > > From: "Stefan Guggisberg" <stefan.guggisberg@gmail.com>
> > > > To: <jackrabbit-dev@incubator.apache.org>
> > > > Sent: Wednesday, November 10, 2004 6:36 PM
> > > > Subject: Re: Multirow update/insert/delete issue
> > > >
> > > > > a few comments/clarifcations inline...
> > > > >
> > > > > On Wed, 10 Nov 2004 17:41:46 +0100, Wolfgang Gehner
> > > > > <wgehner@infonoia.com> wrote:
> > > > > >
> > > > > > As discussed with David offline, when 1000 nodes are inserted,
in
> > the
> > > > current implementation the PersistenceMgr.store() method
> > > > > > is called a 1000 times. So the XMLPersistenceMgr takes 30
seconds to
> > do
> > > > those 1000 write operations.
> > > > >
> > > > > not quite correct: i said that the XML/ObjectPersistenceManager in
> > > > > combination on a CQFileSystem takes ca. 5 sec. for adding and
saving
> > > > > 1000 nodes (that's
> > > > > 2000 write operations, 1000 nodes + 1000 properties).
> > > > >
> > > > > > A JDBC implementation of the current PersistenceMgr API is
> > "condemned"
> > > > to do the same thing. We'd really look to a way to bundle those 1000
> > writes
> > > > into one "transaction", so we can take 2-3 seconds on a relational
> > database
> > > > rather than 30.
> > > > >
> > > > > again, a jdbc implementation is *not* condemned to take 30 sec.!
> > > > > i hacked a quick&dirty implementation of a jdbc persistence
manager
> > (with
> > > > a very
> > > > > *primitive* schema) that took less than 5 sec. for adding and
saving
> > 1000
> > > > nodes.
> > > > >
> > > > > >
> > > > > > So we'd like to throw into the discussion the following
thoughts:
> > > > > > - how about maintaining an instance of of PersistenceMgr (pm)
not on
> > > > (Persistent)NodeState but on NodeImpl
> > > > > > - the implementation of node.save() to collect info what nodes
incl.
> > > > children to save and call a persistenceMgr.store(
> > > > > > nodesToUpdate, nodesToInsert, nodesToDelete) just once. That
way
the
> > pm
> > > > could bundle operations in line with the
> > > > > > repository requirements.
> > > > > >
> > > > > > This would make Jackrabbit's persistence model follow the DAO
(data
> > > > access object) pattern as we understand it.
> > > > > >
> > > > > > Would be pleased to elaborate and discuss. And share our JDBC
> > > > PersistenceMgr prototype with anyone interested (it passes the
current
> > api
> > > > unit test, but has a very non-optimized ER design and is inflicted
with
> > the
> > > > issue discussed in this message).
> > > > > >
> > > > > > Best regards,
> > > > > >
> > > > > > Infonoia S.A.
> > > > > > rue de Berne 7
> > > > > > 1201 Geneva
> > > > > > Tel: +41 22 9000 009
> > > > > > Fax: +41 22 9000 018
> > > > > > wgehner@infonoia.com
> > > > > > http://www.infonoia.com
> > > > > >
> > > >
> > > >
> >
> >


Mime
View raw message