jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig <mdue...@apache.org>
Subject Re: [MongoMK] BlobStore garbage collection
Date Wed, 07 Nov 2012 10:37:54 GMT


On 7.11.12 10:24, Stefan Guggisberg wrote:
> On Wed, Nov 7, 2012 at 10:52 AM, Michael Dürig <mduerig@apache.org> wrote:
>>
>>
>> On 7.11.12 9:48, Thomas Mueller wrote:
>>>
>>> Hi,
>>>
>>> Didn't we talk once about defining a format for blob id references, so
>>> that a value of the format "bin:{blobId}" (or similar) is reference?
>>
>>
>> This is exactly the problem I wanted to pinpoint. There is a conceptual leak
>> here: in order for the Microkernel implementation to know that something is
>> a reference to a binary, it has to know about the interpretation of the
>> items in the repository by the upper layers.
>
> the format of references to binaries is documented in the MicroKernel java doc,
> see "Retention Policy for Binaries" [0].

Oh... missed that ;-) Thanks for clarifying. Oak core does currently not 
take this into account and uses a different format for referencing 
binaries. I'll create an issue.

Michael

>
> cheers
> stefan
>
> [0] http://svn.apache.org/repos/asf/jackrabbit/oak/trunk/oak-mk-api/src/main/java/org/apache/jackrabbit/mk/api/MicroKernel.java
>
>>
>> Michael
>>
>>
>>>
>>> Regards,
>>> Thomas
>>>
>>>
>>>
>>> On 11/7/12 10:17 AM, "Michael Dürig" <mduerig@apache.org> wrote:
>>>
>>>>
>>>> On a related note: how does the garbage collector even find out whether
>>>> a binary is "referenced"? That is, on the Microkernel level, what does
>>>> it actually mean for a binary to be referenced?
>>>>
>>>> Michael
>>>>
>>>> On 6.11.12 18:45, Michael Marth wrote:
>>>>>
>>>>> this might be a weird question from the leftfield, but are we actually
>>>>> sure that the existing data store concept is worth the trouble? afaiu
it
>>>>> saves us from storing the same binary twice, but leads into the DSGC
>>>>> topic. would it be possible to make it optional to store/address
>>>>> binaries by hash (and thus not need DSGC for these configurations)?
>>>>>
>>>>> In any case we should definitely avoid to require repo traversal for
>>>>> DSGC. This would operationally limit the repo sizes Oak can support.
>>>>>
>>>>>
>>>>> --
>>>>> Michael Marth | Engineering Manager
>>>>> +41 61 226 55 22 | mmarth@adobe.com<mailto:mmarth@adobe.com>
>>>>> Barfüsserplatz 6, CH-4001 Basel, Switzerland
>>>>>
>>>>> On Nov 6, 2012, at 9:24 AM, Thomas Mueller wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> 1- What's considered an "old" node or commit? Technically, anything
>>>>> other
>>>>> than the head revision is old but can we remove them right away or do
we
>>>>> need to retain a number of revisions? If the latter, then how far back
>>>>> do
>>>>> we need to retain?
>>>>>
>>>>> we discussed this a while back, no good solution back then[1]
>>>>>
>>>>> Yes. Somebody has to decide which revisions are no longer needed.
>>>>> Luckily
>>>>> it doesn't need to be us :-) We might set a default value (10 minutes
or
>>>>> so), and then give the user the ability to change that, depending on
>>>>> whether he cares more about disk space or the ability to read old data
/
>>>>> roll back to an old state.
>>>>>
>>>>> To free up disk space, BlobStore garbage collection is actually more
>>>>> important, because usually 90% of the disk space is used by the
>>>>> BlobStore.
>>>>> So it would be nice if items (files) in the BlobStore are deleted as
>>>>> soon
>>>>> as possible after deleting old revisions. In Jackrabbit 2.x we have seen
>>>>> that node and data store garbage collection that has to traverse the
>>>>> whole
>>>>> repository is problematic if the repository is large. So garbage
>>>>> collection can be a scalability issue: if we have to traverse all
>>>>> revisions of all nodes in order to delete unused data, we basically tie
>>>>> garbage collection speed with repository size, unless if we find a way
>>>>> to
>>>>> run it in parallel. But running mark & sweep garbage collection
>>>>> completely
>>>>> in parallel is not easy (is it even possible? if yes I would have
>>>>> guessed
>>>>> modern JVMs should have it since a long time). So I think if we don't
>>>>> need
>>>>> to traverse the repository to delete old nodes, but just traverse the
>>>>> journal, this would be much less of a problem.
>>>>>
>>>>> Regards,
>>>>> Thomas
>>>>>
>>>>>
>>>>>
>>>
>>

Mime
View raw message