jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas Toper" <nto...@gmail.com>
Subject Re: Google Summer of Code project for Jackrabbit
Date Wed, 24 May 2006 23:16:29 GMT
Hi David,

Sorry to have been unclear.

What I meant is we have two different kinds of backup to perform.

In one use case I call "regular backup", it is the kind of backup you
perform every night. You do not care not to grab the content just updated,
since you will have it the day after.

In the other use case I call "exceptional backup", you want to have all the
data because for instance you will destroy the repository afterwards.

Those two differs I think in small points. For instance, for "regular
backup", we don't care about transaction started but not committed. In the
second one, we do.

I propose to support only the first use case. The second one would be added
easily later.

I don't know how JackRabbit is used in production environment. Is it
feasible to lock workspace once at a time or it is too cumbersome for the
customer?

For instance, if backuping a workspace needs a two minutes workspace
locking, then it can be done without affecting availibility (but it would
affect reliability). We need data to estimate if it is needed. Can you give
me the size of a typical workspace please?

I am OK to record the transaction and commit it after the locking has
occured but this means changing the semantic of Jackrabbit (a transaction
initiated when a lock is on would be performed after the lock is released
instead of raising an exception ) and I am not sure everybody would think it
is a good idea. We would need to add a transaction log (is there one
already?) and parse transaction to detect conflict (or capture exception
maybe). We would not be able to guarantee anymore a transaction is
persistent and it might have an impact on performance. And what about time
out when running a transaction?

Another idea would be: monitor Jackrabbit and launch the backup when we have
a high probability no transaction are going to be started. But I think
sysadmin already know when load is minimal on their system.

Another idea would be as Miro stated, use more "lower" level strategy
(working on the DB level or directly on the FS). It was actually my first
backup strategy but Jukka thought have to be able to use the tool to migrate
from one PM to another

Here is my suggestion on the locking strategy: we can extend the backup tool
later if needed. Right now even with a global lock, it is an improvement
compared to the current situation. And I need to release the project before
August 21.

I would prefer to start with locking one workspace at a time and if I have
still time then find a way to work with minimal lock. I will most  probably
keep working on Jackrabbit after the Google SoC is over. Are you OK with
this approach?

We are OK on the restore operation. Good idea for the replace or ignore
option but I would recommend to build it only for existing nodes :p
Properties might be more difficult to handle and not as useful (and it
raises a lot more questions).

nico
My blog! http://www.deviant-abstraction.net !!







On 5/24/06, David Kennedy <davek@us.ibm.com> wrote:
>
> "Nicolas Toper" <ntoper@gmail.com> wrote on 05/24/2006 12:03:02 PM:
>
> > You are right: there are two different issues there so possibly two
> > different locking strategies: one for backup and one for restoring.
> >
> > We want to cover for now only one use case: the regular backup of data.
> You
> > would call the tool through a cron everyday at midnight for instance.
> >
> > There is another use case, we need to work on later (after the first
> project
> > for Google is completed): exceptional backup. For instance, before a
> > migration between different version. From my point of view, I would like
> >
>
> Missed the complete thought here...
>
> A regular backup is fine, but I don't expect people to stop using the
> repository while the backup is occurring.  Anything we can do to minimize
> the locked set is advantageous to consumers.
>
> I'm not sure how an exceptional backup differs from a regular backup other
>
> then when it is performed.
>
> > Are those use cases covering your needs?
> >
> > Backup
> > We need to have a coherent image of the system. For a given workspace,
> is
> > the read access always coherent and consistant? I would prefer to avoid
> > locking the workspace since we are backuping a lot of data and it is
> going
> > to take some times.
> >
>
> The read access should be coherent and consistent assuming a transactional
> system.  A locked workspace is the easiest to deal with, but the most
> inconvenient for consumers.  Minimizing the lock set adds complexity
> because there are cases where transactions occurring during the backup can
> affect the data consistency (e.g. References: Node A has a property that
> references Node B....need to ensure when A gets backed up that B and all
> parents do or B may not exist by the time you get to it)
>
> > A workspace lock might not even solve all issues. Have anyone of you
> already
> > solved this? If yes how?
> >
> > Restore
> > I propose to actually lock the workspace while restoring: a restore
> > operation is rare and would delete the previous content. There shouldn't
>
> be
> > any other operations taking place.
> >
>
> Seems like a reasonable starting point.  Will the restore take a Visitor
> to give requestors the ability to REPLACE or IGNORE existing nodes or
> properties.
>
> > Is this approach correct?
> >
> > Nicolas
> >
> >
> > On 5/24/06, David Kennedy <davek@us.ibm.com > wrote:
> > >
> > > Is the intent to lock out the entire workspace while backup or restore
>
> is
> > > occurring......no writes?!  Can the transactions that occur during
> backup
> > > be recorded and played back to the backup system to avoid having to
> lock
> > > the entire workspace?
> > >
> > > Restore is a different issue though....which takes precedence:
> non-restore
> > > operations that occur during restore or the nodes from the backup?  Is
> it
> > > feasible to restore clusters (transitive closure containing
> references) of
> > >
> > > nodes?  Ideally we wouldn't have to lock the entire workspace.
> > >
> > > David
> > >
> > >
> > >
> > > "Nicolas Toper" < ntoper@gmail.com>
> > > 05/24/2006 10:12 AM
> > > Please respond to
> > > dev@jackrabbit.apache.org
> > >
> > >
> > > To
> > > dev@jackrabbit.apache.org, tobias.bocanegra@day.com
> > > cc
> > >
> > > Subject
> > > Re: Google Summer of Code project for Jackrabbit
> > >
> > >
> > >
> > >
> > >
> > >
> > > Hi Tobias,
> > >
> > > Thanks for your feedback.
> > >
> > > I assume I would need to lock the workspace using the getLockManager
> of
> > > WorkspaceImpl.
> > >
> > > Are those lock "time outted"? I mean for instance the backup
> application
> > > crashes, I had a lock on a Workspace, I would need to clean it as soon
> as
> > > possible. If they are not, maybe we should add it in JackRabbit
> through
> > > for
> > > instance a lease.
> > >
> > > What do you think?
> > >
> > > nico
> > > My blog! http://www.deviant-abstraction.net !!
> > >
> > >
> > > On 5/24/06, Tobias Bocanegra < tobias.bocanegra@day.com > wrote:
> > > >
> > > > hi nicolas,
> > > > sounds very promising. just one important comment so far:
> > > >
> > > > > - All updates and read are isolated through transactions. Do we
> need
> > > to
> > > > > define a locking strategy? If I am correct, I can read a node even
> > > > though it
> > > > > is locked and it is threadsafe. You don't commit an incoherent
> > > > modification.
> > > >
> > > > thats not quite true, they are only "READ COMMITTED" isolated. i.e.
> > > > you need to lock the entire workspace before you can perform a
> backup.
> > > >
> > > > regards, tobi
> > > > --
> > > > -----------------------------------------< tobias.bocanegra@day.com
> >---
> > > > Tobias Bocanegra, Day Management AG, Barfuesserplatz 6, CH - 4001
> Basel
> > > > T +41 61 226 98 98, F +41 61 226 98 97
> > > > -----------------------------------------------< http://www.day.com
> >---
> > > >
> > >
> > >
> > >
> >
> >
> > --
> > a+
> > nico
> > My blog! http://www.deviant-abstraction.net !!
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message