jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Klimetschek <aklim...@adobe.com>
Subject Re: how efficiently is versioning implemented?
Date Wed, 16 Feb 2011 16:27:22 GMT
On 16.02.11 16:39, "G√ľnther Schmidt" <gue.schmidt@web.de> wrote:
>how efficiently is versioning implemented?
>Is it similar to copy-on-write, ie. a new version of a document only
>consists of deltas to the previous version? Or is every version of a
>document full-sized? I presume in RDBMS backends it would always be
>full-sized versions, but what about file-based repositories?

In all cases the entire binary is written. The persistence manager, where
you can chose between a RDBMS or other backend, doesn't know about
versioning, it's all just simple node bundles on this level.

However, for binaries there is the (generally recommended) DataStore [0]
that will store large binaries separately, directly as files. Binaries
will only get stored once if you have multiple copies of them in the
repository, using a hash of its contents.

Thus if you create a new version of a node with a binary property, but
only change other properties, not the binary, the binary will not be
stored twice. But if you change the binary, the full binary will be
stored. There is no diffing for versions. (To be exact, you actually write
to the normal repository location ("HEAD") first, then save and only then
create the version, which means creating a version is an internal copy).

Regarding efficiency, it depends what efficiency you mean: read/write
performance or space usage? The current implementation is an optimization
towards read (and partly write) performance - with the cost of requiring
more disk space. Reading binaries, even from older versions, is simply a
direct I/O stream from the disk, without any conversions or diff
calculation. Similar for writes, albeit you have a small overhead through
the hash calculation here, compared to reads.

[0] http://wiki.apache.org/jackrabbit/DataStore


Alexander Klimetschek
Developer // Adobe (Day) // Berlin - Basel

View raw message