jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tyson Norris <tnor...@adobe.com>
Subject Re: MongoMK^2 design proposal
Date Tue, 29 Jan 2013 15:15:25 GMT

On Jan 29, 2013, at 3:45 AM, "Jukka Zitting" <jukka.zitting@gmail.com> wrote:

> Hi,
> On Tue, Jan 29, 2013 at 1:21 PM, Thomas Mueller <mueller@adobe.com> wrote:
>> It's not clear to me how to support scalable concurrent writes. This is
>> also a problem with the current MongoMK design, but I in your design I
>> actually see more problems in this area (concurrent writes to nodes in the
>> same segment for example). But maybe it's just that I don't understand
>> this part of your design yet..
> Segments are immutable, so a commit would create a new segment instead
> of modifying an existing one. The new segment would contain just the
> modified parts of the tree and refer to the older segment(s) for the
> remaining tree. A quick estimate of the size overhead of a minimal
> commit that updates just a single property is in the order of hundreds
> of bytes, depending a bit on the content structure.

Does this mean modifying the same content 10 times changing "most properties" on a certain
node will grow the repo 10x?


>> The data format in your proposal seems to be binary and not Json. For me,
>> using Json would have the advantage that we can use MongoDb features
>> (queries, indexes, atomic operations, debugging,..). With your design,
>> only 1% of the MongoDb features could be used (store a record, read a
>> record), so that basically we would need to implement the remaining
>> features ourselves. On the other hand, it would be extremely simple to
>> port to another storage engine. As far as I understand, all the data might
>> as well be stored in the data store / blob store with very little changes.
> Right. In addition to storage-independence, the main reasons for going
> with a custom binary format instead of JSON was to avoid having to
> parse an entire segment just to access an individual node or value.
> Note that the proposed design actually does rely on lots of MongoDB
> features beyond basic CRUD. Things like sharding, distributed access,
> atomic updates, etc. are essential for the design to scale up well.
>> As far as I understand, a commit where only one single value is changed
>> would result in one journal entry and one segment. I was thinking, would
>> it be possible to split a segment / journal into smaller blocks in such
>> case, but I'm not sure how complex that would be. And the reverse: merge
>> small segments from time to time.
> Indeed, see my response to Marcel's post.
> BR,
> Jukka Zitting

View raw message