jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Jain <am...@ieee.org>
Subject Re: Strategies around storing blobs in Mongo
Date Wed, 30 Oct 2013 10:34:50 GMT
>> So even adding a 2
>>MB chunk on a sharded system over remote connection would block read
>>for that complete duration. So at minimum we should be avoiding that.

I guess if there are read replicas in the shard replica set then, it will
mitigate the effect to some extent



On Wed, Oct 30, 2013 at 3:04 PM, Chetan Mehrotra
<chetan.mehrotra@gmail.com>wrote:

> > sounds reasonable. what is the impact of such a design when it comes
> > to map-reduce features? I was thinking that we could use it e.g. for
> > garbage collection, but I don't know if this is still an option when data
> > is spread across multiple databases.
>
> Would investigate that aspect further
>
> > connecting to a second server would add quite some complexity to
> Yup. Option was just provided for completeness sake. And something
> like this would probably never be required.
>
> > that was one of my initial thoughts as well, but I was wondering what
> > the impact of such a deployment is on data store garbage collection.
>
> Probably we can make a shadow node for the binary in the blob
> collection and keep the binary content within the DataStore itself.
> Stuff like Garbage collection would be performed on the Shadow node
> and logic would use results from that to perform actual deletions.
>
>
> Chetan Mehrotra
>
>
> On Wed, Oct 30, 2013 at 1:13 PM, Marcel Reutegger <mreutegg@adobe.com>
> wrote:
> > Hi,
> >
> >> Currently we are storing blobs by breaking them into small chunks and
> >> then storing those chunks in MongoDB as part of blobs collection. This
> >> approach would cause issues as Mongo maintains a global exclusive
> >> write locks on a per database level [1]. So even writing multiple
> >> small chunks of say 2 MB each would lead to write lock contention.
> >
> > so far we observed high lock content primarily when there are a lot of
> > updates. inserts were not that big of a problem, because you can batch
> > them. it would probably be good to have a test to see how big the
> > impact is when blogs come into play.
> >
> >> Mongo also provides GridFS[2]. However it also uses a similar strategy
> >> like we are currently using and such a support is built into the
> >> Driver. For server they are just collection entries.
> >>
> >> So to minimize contentions for write locks for uses cases where big
> >> assets are being stored in Oak we can opt for following strategies
> >>
> >> 1. Store the blobs collection in a different database. As Mongo write
> >> locks [1] are taken per db level then storing the blobs in different
> >> db would allow the read/write of node data (majority usecase) to
> >> continue.
> >
> > sounds reasonable. what is the impact of such a design when it comes
> > to map-reduce features? I was thinking that we could use it e.g. for
> > garbage collection, but I don't know if this is still an option when data
> > is spread across multiple databases.
> >
> >> 2. For more asset/binary heavy usecase use a separate database server
> >> itself to server the binaries.
> >
> > connecting to a second server would add quite some complexity to
> > the system. wouldn't it be easier to just leverage standard mongodb
> > sharding to distribute the load?
> >
> >> 3. Bring back the JR2 DataStore implementation and just save metadata
> >> related to binaries in Mongo. We already have S3 based implementation
> >> there and they would continue to work with Oak also
> >
> > that was one of my initial thoughts as well, but I was wondering what
> > the impact of such a deployment is on data store garbage collection.
> >
> > regards
> >  marcel
> >
> >> Chetan Mehrotra
> >> [1] http://docs.mongodb.org/manual/faq/concurrency/#how-granular-are-
> >> locks-in-mongodb
> >> [2] http://docs.mongodb.org/manual/core/gridfs/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message