Mailing-List: contact dev-help@jackrabbit.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@jackrabbit.apache.org
Received-SPF: pass (asf.osuosl.org: domain of davek@us.ibm.com designates
 32.97.110.151 as permitted sender)
In-Reply-To: <fcbb46050605240903m69f05df8n2b9e1a27b3c562ae@mail.gmail.com>
To: dev@jackrabbit.apache.org
Subject: Re: Google Summer of Code project for Jackrabbit
MIME-Version: 1.0
From: David Kennedy <davek@us.ibm.com>
Message-ID: 
 <OF380B63DD.F0EB4C20-ON85257178.0060220D-85257178.006256DB@us.ibm.com>
Date: Wed, 24 May 2006 13:58:05 -0400
Content-Type: multipart/alternative;
 boundary="=_alternative 0062563F85257178_="

--=_alternative 0062563F85257178_=
Content-Type: text/plain; charset="US-ASCII"

"Nicolas Toper" <ntoper@gmail.com> wrote on 05/24/2006 12:03:02 PM:

> You are right: there are two different issues there so possibly two
> different locking strategies: one for backup and one for restoring.
> 
> We want to cover for now only one use case: the regular backup of data. 
You
> would call the tool through a cron everyday at midnight for instance.
> 
> There is another use case, we need to work on later (after the first 
project
> for Google is completed): exceptional backup. For instance, before a
> migration between different version. From my point of view, I would like
> 

Missed the complete thought here...

A regular backup is fine, but I don't expect people to stop using the 
repository while the backup is occurring.  Anything we can do to minimize 
the locked set is advantageous to consumers.

I'm not sure how an exceptional backup differs from a regular backup other 
then when it is performed.

> Are those use cases covering your needs?
> 
> Backup
> We need to have a coherent image of the system. For a given workspace, 
is
> the read access always coherent and consistant? I would prefer to avoid
> locking the workspace since we are backuping a lot of data and it is 
going
> to take some times.
> 

The read access should be coherent and consistent assuming a transactional 
system.  A locked workspace is the easiest to deal with, but the most 
inconvenient for consumers.  Minimizing the lock set adds complexity 
because there are cases where transactions occurring during the backup can 
affect the data consistency (e.g. References: Node A has a property that 
references Node B....need to ensure when A gets backed up that B and all 
parents do or B may not exist by the time you get to it)

> A workspace lock might not even solve all issues. Have anyone of you 
already
> solved this? If yes how?
> 
> Restore
> I propose to actually lock the workspace while restoring: a restore
> operation is rare and would delete the previous content. There shouldn't 
be
> any other operations taking place.
> 

Seems like a reasonable starting point.  Will the restore take a Visitor 
to give requestors the ability to REPLACE or IGNORE existing nodes or 
properties.

> Is this approach correct?
> 
> Nicolas
> 
> 
> On 5/24/06, David Kennedy <davek@us.ibm.com > wrote:
> >
> > Is the intent to lock out the entire workspace while backup or restore 
is
> > occurring......no writes?!  Can the transactions that occur during 
backup
> > be recorded and played back to the backup system to avoid having to 
lock
> > the entire workspace?
> >
> > Restore is a different issue though....which takes precedence: 
non-restore
> > operations that occur during restore or the nodes from the backup?  Is 
it
> > feasible to restore clusters (transitive closure containing 
references) of
> >
> > nodes?  Ideally we wouldn't have to lock the entire workspace.
> >
> > David
> >
> >
> >
> > "Nicolas Toper" < ntoper@gmail.com>
> > 05/24/2006 10:12 AM
> > Please respond to
> > dev@jackrabbit.apache.org
> >
> >
> > To
> > dev@jackrabbit.apache.org, tobias.bocanegra@day.com
> > cc
> >
> > Subject
> > Re: Google Summer of Code project for Jackrabbit
> >
> >
> >
> >
> >
> >
> > Hi Tobias,
> >
> > Thanks for your feedback.
> >
> > I assume I would need to lock the workspace using the getLockManager 
of
> > WorkspaceImpl.
> >
> > Are those lock "time outted"? I mean for instance the backup 
application
> > crashes, I had a lock on a Workspace, I would need to clean it as soon 
as
> > possible. If they are not, maybe we should add it in JackRabbit 
through
> > for
> > instance a lease.
> >
> > What do you think?
> >
> > nico
> > My blog! http://www.deviant-abstraction.net !!
> >
> >
> > On 5/24/06, Tobias Bocanegra < tobias.bocanegra@day.com > wrote:
> > >
> > > hi nicolas,
> > > sounds very promising. just one important comment so far:
> > >
> > > > - All updates and read are isolated through transactions. Do we 
need
> > to
> > > > define a locking strategy? If I am correct, I can read a node even
> > > though it
> > > > is locked and it is threadsafe. You don't commit an incoherent
> > > modification.
> > >
> > > thats not quite true, they are only "READ COMMITTED" isolated. i.e.
> > > you need to lock the entire workspace before you can perform a 
backup.
> > >
> > > regards, tobi
> > > --
> > > -----------------------------------------< tobias.bocanegra@day.com 
>---
> > > Tobias Bocanegra, Day Management AG, Barfuesserplatz 6, CH - 4001 
Basel
> > > T +41 61 226 98 98, F +41 61 226 98 97
> > > -----------------------------------------------< http://www.day.com 
>---
> > >
> >
> >
> >
> 
> 
> -- 
> a+
> nico
> My blog! http://www.deviant-abstraction.net !!

--=_alternative 0062563F85257178_=--