jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas Toper" <nto...@gmail.com>
Subject Re: Google Summer of Code project for Jackrabbit
Date Fri, 26 May 2006 16:02:52 GMT
Hi,

I think we all agree now on how to handle "hot backup" and how to avoid
write locking a workspace.

This functionnality was not initially planned in my proposal but since we
need it. I will implement it.

I will write a summary on this issue in the next few days for your
validation.

Cheers,
Nicolas

My blog! http://www.deviant-abstraction.net !!

On 5/26/06, Stefan Guggisberg <stefan.guggisberg@gmail.com> wrote:
>
> hi nico
>
> On 5/25/06, Nicolas Toper <ntoper@gmail.com> wrote:
> > Just to summarize everything we have said on this issue.
> >
> > There are two kinds of lock: the jcr.Lock and the
> > EDU.oswego.cs.dl.util.concurrent.*. The two are somewhat not related. Am
> I
> > correct?
> >
> > There are no issues with jcr.Lock (we can still read a node).
> >
> > We need some mutex to avoid inconsistant IO operations using the
> > util.concurrent package. I like Tobias approach to add a proxyPM.
> It  seems
> > easy. But is this solution elegant enough and maintenable in the long
> run?
> > Would it help us later? (I think so since it would allow delayed write
> which
> > open the way for a 2 phase locking algorithm.) I am not in this project
> > since long enough to judge :p
> >
> > Why didn't Jackrabbit go for serializable transaction by the way? I have
> > checked the code and it seems we have all the needed kind of locks to
> > support 2PL (out of scope of the current project of course).
> >
> > If we plan to support serializable transaction soon, then case 2 is
> > acceptable. Is this the case?
> >
> > About Tobias ProxyPM: I am ok to write it although it is out of scope of
> the
> > initial project, you all seem to really need it, so let's go for it.
> Jukka?
> >
> > For a specific workspace, I would still allow read operations from other
> > sessions and isolate all write access (this way there will be no
> conflict).
> > I can even make persistant the modification using an already existing PM
> in
> > case of crash. One question though: I cannot guarantee the transaction
> would
> > be later committed without exception. We can choose to ignore this issue
> or
> > add an asynchronous way to warn the session. What are your thoughts on
> this?
> >
>
> we already have this scenario. a session's modifications are
> potentially committed
> asynchronously and the commit can fail for a number of reasons. that's
> fine with me.
>
> cheers
> stefan
>
> > This means a modification in the core package. Are you all OK with this?
> >
> >
> > By the way, this kind of algorithm is called a pessismistic receiver
> based
> > logging message algorithm. We use it in distributed systems.
> >
> >
> >
> > Thanks for your support and ideas.
> > nico
> > My blog! http://www.deviant-abstraction.net !!
> >
> >
> >
> >
> > On 5/25/06, Tobias Bocanegra < tobias.bocanegra@day.com> wrote:
> > >
> > > i think there is a consensus of what backup levels there can be:
> > >
> > > 1) read locked workspaces
> > > 2) write locked workspaces
> > > 3) hot-backup (i.e. "SERIALIZABLE" isolation)
> > >
> > > in case 1, the entire workspace is completely locked (rw) and no one
> > > else than the backup-session can read-access the workspace. this is
> > > probably the easiest to implement and the least desirable.
> > >
> > > in case 2, the entire workspace becomes read-only, i.e. is
> > > write-locked. so all other sessions can continue reading the
> > > workspace, but are not allowed to write to it. this is also more or
> > > less easy to implement, intoducing a 'global' lock in the lock
> > > manager.
> > >
> > > in case 3, all sessions can continue operations on the workspace, but
> > > the backup-session sees a snapshot view of the workspace. this would
> > > be easy, if we had serializable isolated transactions, which we don't
> > > :-(
> > >
> > > for larger productive environments, only case 3 is acceptable. the way
> > > i see of how to impement this, is to create a
> > > 'proxy-persistencemanager' that sits between the
> > > shareditemstatemanager and the real persistencemanager. during normal
> > > operation, it just passes the changes down to the real pm, but in
> > > backup-mode, it keeps an own storage for the changes that occurr
> > > during the backup. when backup is finished, it resends all changes
> > > down to the real pm. using this mechanism, you have a stable snapshot
> > > of the states in the real pm during backup mode. the export would then
> > > access directly the real pm.
> > >
> > > regards, toby
> > >
> > >
> > > On 5/25/06, Nicolas Toper < ntoper@gmail.com> wrote:
> > > > Hi David,
> > > >
> > > > Sorry to have been unclear.
> > > >
> > > > What I meant is we have two different kinds of backup to perform.
> > > >
> > > > In one use case I call "regular backup", it is the kind of backup
> you
> > > > perform every night. You do not care not to grab the content just
> > > updated,
> > > > since you will have it the day after.
> > > >
> > > > In the other use case I call "exceptional backup", you want to have
> all
> > > the
> > > > data because for instance you will destroy the repository
> afterwards.
> > > >
> > > > Those two differs I think in small points. For instance, for
> "regular
> > > > backup", we don't care about transaction started but not committed.
> In
> > > the
> > > > second one, we do.
> > > >
> > > > I propose to support only the first use case. The second one would
> be
> > > added
> > > > easily later.
> > > >
> > > > I don't know how JackRabbit is used in production environment. Is it
> > > > feasible to lock workspace once at a time or it is too cumbersome
> for
> > > the
> > > > customer?
> > > >
> > > > For instance, if backuping a workspace needs a two minutes workspace
> > > > locking, then it can be done without affecting availibility (but it
> > > would
> > > > affect reliability). We need data to estimate if it is needed. Can
> you
> > > give
> > > > me the size of a typical workspace please?
> > > >
> > > > I am OK to record the transaction and commit it after the locking
> has
> > > > occured but this means changing the semantic of Jackrabbit (a
> > > transaction
> > > > initiated when a lock is on would be performed after the lock is
> > > released
> > > > instead of raising an exception ) and I am not sure everybody would
> > > think it
> > > > is a good idea. We would need to add a transaction log (is there one
> > > > already?) and parse transaction to detect conflict (or capture
> exception
> > >
> > > > maybe). We would not be able to guarantee anymore a transaction is
> > > > persistent and it might have an impact on performance. And what
> about
> > > time
> > > > out when running a transaction?
> > > >
> > > > Another idea would be: monitor Jackrabbit and launch the backup when
> we
> > > have
> > > > a high probability no transaction are going to be started. But I
> think
> > > > sysadmin already know when load is minimal on their system.
> > > >
> > > > Another idea would be as Miro stated, use more "lower" level
> strategy
> > > > (working on the DB level or directly on the FS). It was actually my
> > > first
> > > > backup strategy but Jukka thought have to be able to use the tool to
> > > migrate
> > > > from one PM to another
> > > >
> > > > Here is my suggestion on the locking strategy: we can extend the
> backup
> > > tool
> > > > later if needed. Right now even with a global lock, it is an
> improvement
> > > > compared to the current situation. And I need to release the project
> > > before
> > > > August 21.
> > > >
> > > > I would prefer to start with locking one workspace at a time and if
> I
> > > have
> > > > still time then find a way to work with minimal lock. I will
> > > most  probably
> > > > keep working on Jackrabbit after the Google SoC is over. Are you OK
> with
> > > > this approach?
> > > >
> > > > We are OK on the restore operation. Good idea for the replace or
> ignore
> > > > option but I would recommend to build it only for existing nodes :p
> > > > Properties might be more difficult to handle and not as useful (and
> it
> > > > raises a lot more questions).
> > > >
> > > > nico
> > > > My blog! http://www.deviant-abstraction.net !!
> > > >
> > > >
> > >
> > >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message