jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Mueller <muel...@adobe.com>
Subject Re: clustered garbage collection
Date Thu, 26 May 2011 12:09:33 GMT
Hi,

The way garbage collection works, I don't see a potential problem if you
run garbage collection concurrently.

When garbage collection is running, each file that is accessed is
'touched' (the last modified time is changed to the current time). If you
run it concurrently, this still will happen. At the end of the GC, old
files (untouched files) are deleted.

So it shouldn't be a problem. Of course I would avoid to run it
concurrently, because it's enough to run it on one cluster node (it's
simply a waste of time to run it concurrently).

Regards,
Thomas


On 5/26/11 1:22 PM, "John Langley" <langleyatwork@gmail.com> wrote:

>First off, thanks to writers of this great little description of how to do
>garbage collection and Fabian for pointing it out.
>http://wiki.apache.org/jackrabbit/DataStore#Data_Store_Garbage_Collection
>
>My next question concerns running garbage collection in a cluster. If had
>a
>number of identical nodes running in a cluster, each of them periodically
>running a garbage collection task, where the periods may overlap... say
>nodes 1 starts and then in the middle of either the mark or the sweep,
>node
>2 starts it's mark or perhaps even overlaps it's sweep.... what will
>the consequences be? Will they "collide", i.e. will their be unexpected
>errors (explicit exception based errors) or mis-behaviors (implicit
>non-identified errors)?
>
>Of course, the alternative is to guarantee that only one node in the
>cluster
>is responsible for the periodic mark and sweep.
>
>Thanks in advance for any pointers or insights. This community has been
>GREAT at responding to questions with very helpful solutions and bug
>fixes.
>
>-- Langley


Mime
View raw message