jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Parvulescu <alex.parvule...@gmail.com>
Subject Re: Inefficient backup on TarMK
Date Thu, 20 Mar 2014 15:05:07 GMT
On Thu, Mar 20, 2014 at 1:33 PM, Jukka Zitting <jukka.zitting@gmail.com>wrote:

> Hi,
>
> On Thu, Mar 20, 2014 at 5:50 AM, Alex Parvulescu
> <alex.parvulescu@gmail.com> wrote:
> > Th problem I experienced comes in when there is enough content writes
> that
> > a segment flush is triggered, so basically the same node, even unchanged
> > ends up in a different segment, so with a different segment id.
>
> Hmm, I believe you're seeing something different here, as a flush will
> never change the identifier of a node.
>

Good point.
Note: I did not say 'identifier of a node', I said segment ids change for a
node.

I tried to simplify things a bit and ran the backup on a shut down
instance, trying to isolate some of the moving parts.
My first observation is that one node can end up on a different segment on
the target instance. I initially thought this may be because of the segment
flush, seeing your comments I'm not sure why this happens.

Ex:
CH /tmp[0d05550a-0cb4-44e8-a217-f28fd876dcfb:261764 vs
bed530fe-65ad-4306-a803-56d96f9fceff:258604]
CH /home[0d05550a-0cb4-44e8-a217-f28fd876dcfb:257752 vs
6363de84-0a1e-4ec3-a8e4-84570129b938:261564]
CH /home/rep:policy[0d05550a-0cb4-44e8-a217-f28fd876dcfb:260940 vs
6363de84-0a1e-4ec3-a8e4-84570129b938:261608]

  CH means a change was detected on the given path and you can see the
different segment ids. this comes from running a backup on the same source
more than once.
  "0d05550a-0cb4-44e8-a217-f28fd876dcfb:261764" recordid notation:
segmentid:offset.

The same change is reported each time you run the backup, meaning that the
segment id that changes on the target instance is never updated to match,
but the content is the same.


>
> > With time more and more segments are created, and as far as I can see
> nodes
> > that have no changes migrate to different segments.
>
> An unchanged node should never move from one segment to another within
> a given repository.
>
> Perhaps the comparison is between content in the source repository and
> that in the backup repository? In that case the segment identifiers
> wouldn't match, and the comparison would slow down as described.
>
>
Yes exactly, the backup runs a diff between the source instance and the
target instance. One would expect that the backup is incremental in the
sense that running it a consecutive time yields only the modifications that
happened during that time period but these findings show otherwise.


thanks,
alex




> BR,
>
> Jukka Zitting
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message