jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Parvulescu <alex.parvule...@gmail.com>
Subject Inefficient backup on TarMK
Date Thu, 20 Mar 2014 09:50:22 GMT
Hi,

I'd like to ask advice about a problem I've noticed recently concerning the
tarmk backup.

At its core, the tarmk backup relies on a regular content diff. First
backup doesn't find anyhting, copies all nodes over, second backup and on,
diffs the content to incrementally apply the changes.

One optimization of the tarmk diff is to check if the segment ids of 2 node
states are the same, this makes for a really fast compareTo method.

These 2 combined make for a fast and incremental backup, so far so good.

Th problem I experienced comes in when there is enough content writes that
a segment flush is triggered, so basically the same node, even unchanged
ends up in a different segment, so with a different segment id.
Now the backup fails to fast-match the node states and falls back to
traversing of the content, to match-and-apply changes, except there are
none.
With time more and more segments are created, and as far as I can see nodes
that have no changes migrate to different segments. All these migrations
are seen as changes and generate content traversals.

The reason this escalates is that the incremental backup will never update
the segment ids on the target instance, it will only look at content, so an
incremental backup will report more and more changes and traverse the repo
content simply because the segments will restructure.

thoughts?


thanks,
alex

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message