jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Mueller <muel...@adobe.com>
Subject Re: Jackrabbit GC on huge MySQL Database
Date Wed, 17 Sep 2014 13:25:56 GMT

In Jackrabbit Oak, we have a different (much, much faster) approach to do
garbage collection, but there is no plan to backport that to Jackrabbit
2.x. The approach is: scan the repository (not traverse, but do a low
level scan in the persistent storage) for blob ids. Then get the list of
blobs from the data store, and delete those that are not in the list of
blob ids in use.

This is much faster mainly because of two things: first, (and most
importantly) it avoids random access reads (the primary key in Jackrabbit
2.x for the nodes is randomly distributed; this is no longer the case for
the default storage engines in Jackrabbit Oak). Second, it avoid marking
all binaries that are still in use.

You could implement this for Jackrabbit 2.x, or you could switch to
Jackrabbit Oak.


On 16/09/14 13:50, "uv" <vlastimilunucka@gmail.com> wrote:

>our system uses jackrabbit 2.6.5 and MySQL DB datastore. Jackrabbit DB
>schema size is 300GB, most of it is in datastore. When we run jackrabbit
>garbage collector, it runs almost 3 days. Running GC has significant
>on application performance.
>Could you please advice what possibility we have?
>Somehow spit GC to do not iterate through whole datastore? When GC is not
>finished completely, we can not run datastore clean because we can not be
>sure what has been scanned and what has not.
>Or is there any other GC implementation?
>Thank you very much.
>View this message in context:
>Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.

View raw message