jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Parvulescu <alex.parvule...@gmail.com>
Subject Re: Inefficient backup on TarMK
Date Thu, 20 Mar 2014 17:45:03 GMT
Hi,

Indeed you are right, local backup is pretty efficient and it will perform
properly when it has the checkpoint available.

I was a bit off in my observations initially when I tried to backup using
an HttpStore based setup for testing the failover and I assumed that this
perceived slowness applies to local backups as well.

The checkpoint operation is key and when you try to backup via http (using
the HttpStore) it is not currently available, so the backup will not be as
efficient, because even if it doesn't write anything (no output from
JsopDiff), it will traverse the entire source repository, which is far from
ideal.

So this issue will only affect the failover scenario, but it may be worth
investigating. You cannot currently set the head via the HttpStore, but if
we push the checkpoint operation further down (from the NodeStore to
the SegmentStore) we can have this working via http as well on the form of
a POST of sorts.

best,
alex



On Thu, Mar 20, 2014 at 4:45 PM, Jukka Zitting <jukka.zitting@gmail.com>wrote:

> Hi,
>
> On Thu, Mar 20, 2014 at 11:05 AM, Alex Parvulescu
> <alex.parvulescu@gmail.com> wrote:
> > On Thu, Mar 20, 2014 at 1:33 PM, Jukka Zitting <jukka.zitting@gmail.com
> >wrote:
> >> Perhaps the comparison is between content in the source repository and
> >> that in the backup repository? In that case the segment identifiers
> >> wouldn't match, and the comparison would slow down as described.
> >
> > Yes exactly, the backup runs a diff between the source instance and the
> > target instance.
>
> Ideally it shouldn't, that's why the backup code tries to instead use
> the checkpoint that was used for the previous backup [1]. The
> comparison across repositories is much slower and produces some
> slightly unexpected behavior like what you're probably seeing here.
>
> > One would expect that the backup is incremental in the
> > sense that running it a consecutive time yields only the modifications
> that
> > happened during that time period but these findings show otherwise.
>
> If the comparison is done across repositories, the content diff will
> call childNodeChanged() even on unmodified nodes as permitted by
> OAK-914 [2]. The reason for this is that the child node identifiers
> will not match across repositories, so there's no efficient way for
> the content diff to tell whether the subtrees are equal or not.
>
> However, AFAICT these extra childNodeChanged() calls should only
> result in some slowdown as ApplyDiff recurses down the tree, not
> actual changes to be written by SegmentWriter. Can you for example try
> dumping the output of JsopDiff.diffToJsop() before line 74 in
> FileStoreBackup to verify that no changes are unexpectedly showing up?
>
> [1]
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/backup/FileStoreBackup.java#L61
> [2] https://issues.apache.org/jira/browse/OAK-914
>
> BR,
>
> Jukka Zitting
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message