jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mete Atamel <mata...@adobe.com>
Subject Re: [MongoMK] BlobStore garbage collection
Date Mon, 05 Nov 2012 11:08:48 GMT
On a related note, I think we also need a NodeStore (nodes & commits)
garbage collection in MongoMK. Otherwise, MongoDB will be full of old node
and commit data with no real benefit. The basic implementation idea is to
have a background task to periodically go through old nodes and commits
and delete them but this raises questions such as:

1- What's considered an "old" node or commit? Technically, anything other
than the head revision is old but can we remove them right away or do we
need to retain a number of revisions? If the latter, then how far back do
we need to retain?

2- How often should the NodeStore GC run and for how long? How should this
be controlled?

3- Do other MicroKernel implementations handle this, if so how?

If you have any feedback on any of this, I'd like to hear.


On 11/2/12 4:38 PM, "Mete Atamel" <matamel@adobe.com> wrote:

>Thanks. Yes, I also think it's worthwhile to try implementing MongoDB
>BlobStore based on AbstractBlobStore. Do we have tests somewhere where we
>can compare different BlobStore implementations?
>On 11/2/12 3:50 PM, "Thomas Mueller" <mueller@adobe.com> wrote:
>>I would definitely at least *try* to implement a MongoDB BlobStore based
>>on the AbstractBlobStore. It should be quite simple (one class). Then, it
>>would be interesting to know which implementation is faster: the GridFS
>>one or an implementation based on AbstractBlobStore :-) Specially if the
>>difference is big. If GridFS is faster, maybe we could learn something
>>from them.
>>It looks like GridFS uses md5 hashes, that sounds a bit risky to me,
>>specially if anonymous users can create binaries. An attacker could
>>two files with the same md5 hash, which would at least "confuse" Oak and
>>maybe GridFS, or maybe worse. I mean, using md5 for your own files is
>>fine, but it seems problematic for Oak, because it would somewhat limit
>>the use cases.
>>On 11/2/12 10:30 AM, "Mete Atamel" <matamel@adobe.com> wrote:
>>>One of the things I need to implement for MongoMK is BlobStore garbage
>>>collection. I see that there's an initial implementation for garbage
>>>collection in AbstractBlobStore in oak-mk and I also see this bug [0] to
>>>improve that initial implementation.
>>>MongoMK uses a GridFS based BlobStore, separate from AbstractBlobStore
>>>oak-mk. I could potentially come up with my own GC, based on that GridFS
>>>implementation, or I could try a new AbstractBlobStore implementation
>>>MongoMK (not GridFS based). With the second approach, I potentially get
>>>current and future garbage collection improvements for free.
>>>Not sure which path to follow yet but I wanted to see what others
>>>before starting to work on it.
>>>[0] https://issues.apache.org/jira/browse/OAK-377

View raw message