jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tobias Bocanegra" <tobias.bocane...@day.com>
Subject Re: Google Summer of Code project for Jackrabbit
Date Thu, 25 May 2006 10:58:47 GMT
i think there is a consensus of what backup levels there can be:

1) read locked workspaces
2) write locked workspaces
3) hot-backup (i.e. "SERIALIZABLE" isolation)

in case 1, the entire workspace is completely locked (rw) and no one
else than the backup-session can read-access the workspace. this is
probably the easiest to implement and the least desirable.

in case 2, the entire workspace becomes read-only, i.e. is
write-locked. so all other sessions can continue reading the
workspace, but are not allowed to write to it. this is also more or
less easy to implement, intoducing a 'global' lock in the lock
manager.

in case 3, all sessions can continue operations on the workspace, but
the backup-session sees a snapshot view of the workspace. this would
be easy, if we had serializable isolated transactions, which we don't
:-(

for larger productive environments, only case 3 is acceptable. the way
i see of how to impement this, is to create a
'proxy-persistencemanager' that sits between the
shareditemstatemanager and the real persistencemanager. during normal
operation, it just passes the changes down to the real pm, but in
backup-mode, it keeps an own storage for the changes that occurr
during the backup. when backup is finished, it resends all changes
down to the real pm. using this mechanism, you have a stable snapshot
of the states in the real pm during backup mode. the export would then
access directly the real pm.

regards, toby


On 5/25/06, Nicolas Toper <ntoper@gmail.com> wrote:
> Hi David,
>
> Sorry to have been unclear.
>
> What I meant is we have two different kinds of backup to perform.
>
> In one use case I call "regular backup", it is the kind of backup you
> perform every night. You do not care not to grab the content just updated,
> since you will have it the day after.
>
> In the other use case I call "exceptional backup", you want to have all the
> data because for instance you will destroy the repository afterwards.
>
> Those two differs I think in small points. For instance, for "regular
> backup", we don't care about transaction started but not committed. In the
> second one, we do.
>
> I propose to support only the first use case. The second one would be added
> easily later.
>
> I don't know how JackRabbit is used in production environment. Is it
> feasible to lock workspace once at a time or it is too cumbersome for the
> customer?
>
> For instance, if backuping a workspace needs a two minutes workspace
> locking, then it can be done without affecting availibility (but it would
> affect reliability). We need data to estimate if it is needed. Can you give
> me the size of a typical workspace please?
>
> I am OK to record the transaction and commit it after the locking has
> occured but this means changing the semantic of Jackrabbit (a transaction
> initiated when a lock is on would be performed after the lock is released
> instead of raising an exception ) and I am not sure everybody would think it
> is a good idea. We would need to add a transaction log (is there one
> already?) and parse transaction to detect conflict (or capture exception
> maybe). We would not be able to guarantee anymore a transaction is
> persistent and it might have an impact on performance. And what about time
> out when running a transaction?
>
> Another idea would be: monitor Jackrabbit and launch the backup when we have
> a high probability no transaction are going to be started. But I think
> sysadmin already know when load is minimal on their system.
>
> Another idea would be as Miro stated, use more "lower" level strategy
> (working on the DB level or directly on the FS). It was actually my first
> backup strategy but Jukka thought have to be able to use the tool to migrate
> from one PM to another
>
> Here is my suggestion on the locking strategy: we can extend the backup tool
> later if needed. Right now even with a global lock, it is an improvement
> compared to the current situation. And I need to release the project before
> August 21.
>
> I would prefer to start with locking one workspace at a time and if I have
> still time then find a way to work with minimal lock. I will most  probably
> keep working on Jackrabbit after the Google SoC is over. Are you OK with
> this approach?
>
> We are OK on the restore operation. Good idea for the replace or ignore
> option but I would recommend to build it only for existing nodes :p
> Properties might be more difficult to handle and not as useful (and it
> raises a lot more questions).
>
> nico
> My blog! http://www.deviant-abstraction.net !!
>
>
>
>
>
>
>
> On 5/24/06, David Kennedy <davek@us.ibm.com> wrote:
> >
> > "Nicolas Toper" <ntoper@gmail.com> wrote on 05/24/2006 12:03:02 PM:
> >
> > > You are right: there are two different issues there so possibly two
> > > different locking strategies: one for backup and one for restoring.
> > >
> > > We want to cover for now only one use case: the regular backup of data.
> > You
> > > would call the tool through a cron everyday at midnight for instance.
> > >
> > > There is another use case, we need to work on later (after the first
> > project
> > > for Google is completed): exceptional backup. For instance, before a
> > > migration between different version. From my point of view, I would like
> > >
> >
> > Missed the complete thought here...
> >
> > A regular backup is fine, but I don't expect people to stop using the
> > repository while the backup is occurring.  Anything we can do to minimize
> > the locked set is advantageous to consumers.
> >
> > I'm not sure how an exceptional backup differs from a regular backup other
> >
> > then when it is performed.
> >
> > > Are those use cases covering your needs?
> > >
> > > Backup
> > > We need to have a coherent image of the system. For a given workspace,
> > is
> > > the read access always coherent and consistant? I would prefer to avoid
> > > locking the workspace since we are backuping a lot of data and it is
> > going
> > > to take some times.
> > >
> >
> > The read access should be coherent and consistent assuming a transactional
> > system.  A locked workspace is the easiest to deal with, but the most
> > inconvenient for consumers.  Minimizing the lock set adds complexity
> > because there are cases where transactions occurring during the backup can
> > affect the data consistency (e.g. References: Node A has a property that
> > references Node B....need to ensure when A gets backed up that B and all
> > parents do or B may not exist by the time you get to it)
> >
> > > A workspace lock might not even solve all issues. Have anyone of you
> > already
> > > solved this? If yes how?
> > >
> > > Restore
> > > I propose to actually lock the workspace while restoring: a restore
> > > operation is rare and would delete the previous content. There shouldn't
> >
> > be
> > > any other operations taking place.
> > >
> >
> > Seems like a reasonable starting point.  Will the restore take a Visitor
> > to give requestors the ability to REPLACE or IGNORE existing nodes or
> > properties.
> >
> > > Is this approach correct?
> > >
> > > Nicolas
> > >
> > >
> > > On 5/24/06, David Kennedy <davek@us.ibm.com > wrote:
> > > >
> > > > Is the intent to lock out the entire workspace while backup or restore
> >
> > is
> > > > occurring......no writes?!  Can the transactions that occur during
> > backup
> > > > be recorded and played back to the backup system to avoid having to
> > lock
> > > > the entire workspace?
> > > >
> > > > Restore is a different issue though....which takes precedence:
> > non-restore
> > > > operations that occur during restore or the nodes from the backup?  Is
> > it
> > > > feasible to restore clusters (transitive closure containing
> > references) of
> > > >
> > > > nodes?  Ideally we wouldn't have to lock the entire workspace.
> > > >
> > > > David
> > > >
> > > >
> > > >
> > > > "Nicolas Toper" < ntoper@gmail.com>
> > > > 05/24/2006 10:12 AM
> > > > Please respond to
> > > > dev@jackrabbit.apache.org
> > > >
> > > >
> > > > To
> > > > dev@jackrabbit.apache.org, tobias.bocanegra@day.com
> > > > cc
> > > >
> > > > Subject
> > > > Re: Google Summer of Code project for Jackrabbit
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Hi Tobias,
> > > >
> > > > Thanks for your feedback.
> > > >
> > > > I assume I would need to lock the workspace using the getLockManager
> > of
> > > > WorkspaceImpl.
> > > >
> > > > Are those lock "time outted"? I mean for instance the backup
> > application
> > > > crashes, I had a lock on a Workspace, I would need to clean it as soon
> > as
> > > > possible. If they are not, maybe we should add it in JackRabbit
> > through
> > > > for
> > > > instance a lease.
> > > >
> > > > What do you think?
> > > >
> > > > nico
> > > > My blog! http://www.deviant-abstraction.net !!
> > > >
> > > >
> > > > On 5/24/06, Tobias Bocanegra < tobias.bocanegra@day.com > wrote:
> > > > >
> > > > > hi nicolas,
> > > > > sounds very promising. just one important comment so far:
> > > > >
> > > > > > - All updates and read are isolated through transactions. Do
we
> > need
> > > > to
> > > > > > define a locking strategy? If I am correct, I can read a node
even
> > > > > though it
> > > > > > is locked and it is threadsafe. You don't commit an incoherent
> > > > > modification.
> > > > >
> > > > > thats not quite true, they are only "READ COMMITTED" isolated. i.e.
> > > > > you need to lock the entire workspace before you can perform a
> > backup.
> > > > >
> > > > > regards, tobi
> > > > > --
> > > > > -----------------------------------------< tobias.bocanegra@day.com
> > >---
> > > > > Tobias Bocanegra, Day Management AG, Barfuesserplatz 6, CH - 4001
> > Basel
> > > > > T +41 61 226 98 98, F +41 61 226 98 97
> > > > > -----------------------------------------------< http://www.day.com
> > >---
> > > > >
> > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > a+
> > > nico
> > > My blog! http://www.deviant-abstraction.net !!
> >
> >
>
>


-- 
-----------------------------------------< tobias.bocanegra@day.com >---
Tobias Bocanegra, Day Management AG, Barfuesserplatz 6, CH - 4001 Basel
T +41 61 226 98 98, F +41 61 226 98 97
-----------------------------------------------< http://www.day.com >---

Mime
View raw message