jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chetan Mehrotra <chetan.mehro...@gmail.com>
Subject Re: Strategies around storing blobs in Mongo
Date Wed, 30 Oct 2013 09:34:46 GMT
> sounds reasonable. what is the impact of such a design when it comes
> to map-reduce features? I was thinking that we could use it e.g. for
> garbage collection, but I don't know if this is still an option when data
> is spread across multiple databases.

Would investigate that aspect further

> connecting to a second server would add quite some complexity to
Yup. Option was just provided for completeness sake. And something
like this would probably never be required.

> that was one of my initial thoughts as well, but I was wondering what
> the impact of such a deployment is on data store garbage collection.

Probably we can make a shadow node for the binary in the blob
collection and keep the binary content within the DataStore itself.
Stuff like Garbage collection would be performed on the Shadow node
and logic would use results from that to perform actual deletions.


Chetan Mehrotra


On Wed, Oct 30, 2013 at 1:13 PM, Marcel Reutegger <mreutegg@adobe.com> wrote:
> Hi,
>
>> Currently we are storing blobs by breaking them into small chunks and
>> then storing those chunks in MongoDB as part of blobs collection. This
>> approach would cause issues as Mongo maintains a global exclusive
>> write locks on a per database level [1]. So even writing multiple
>> small chunks of say 2 MB each would lead to write lock contention.
>
> so far we observed high lock content primarily when there are a lot of
> updates. inserts were not that big of a problem, because you can batch
> them. it would probably be good to have a test to see how big the
> impact is when blogs come into play.
>
>> Mongo also provides GridFS[2]. However it also uses a similar strategy
>> like we are currently using and such a support is built into the
>> Driver. For server they are just collection entries.
>>
>> So to minimize contentions for write locks for uses cases where big
>> assets are being stored in Oak we can opt for following strategies
>>
>> 1. Store the blobs collection in a different database. As Mongo write
>> locks [1] are taken per db level then storing the blobs in different
>> db would allow the read/write of node data (majority usecase) to
>> continue.
>
> sounds reasonable. what is the impact of such a design when it comes
> to map-reduce features? I was thinking that we could use it e.g. for
> garbage collection, but I don't know if this is still an option when data
> is spread across multiple databases.
>
>> 2. For more asset/binary heavy usecase use a separate database server
>> itself to server the binaries.
>
> connecting to a second server would add quite some complexity to
> the system. wouldn't it be easier to just leverage standard mongodb
> sharding to distribute the load?
>
>> 3. Bring back the JR2 DataStore implementation and just save metadata
>> related to binaries in Mongo. We already have S3 based implementation
>> there and they would continue to work with Oak also
>
> that was one of my initial thoughts as well, but I was wondering what
> the impact of such a deployment is on data store garbage collection.
>
> regards
>  marcel
>
>> Chetan Mehrotra
>> [1] http://docs.mongodb.org/manual/faq/concurrency/#how-granular-are-
>> locks-in-mongodb
>> [2] http://docs.mongodb.org/manual/core/gridfs/

Mime
View raw message