jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: MongoMK^2 design proposal
Date Tue, 29 Jan 2013 11:44:09 GMT

On Tue, Jan 29, 2013 at 1:21 PM, Thomas Mueller <mueller@adobe.com> wrote:
> It's not clear to me how to support scalable concurrent writes. This is
> also a problem with the current MongoMK design, but I in your design I
> actually see more problems in this area (concurrent writes to nodes in the
> same segment for example). But maybe it's just that I don't understand
> this part of your design yet..

Segments are immutable, so a commit would create a new segment instead
of modifying an existing one. The new segment would contain just the
modified parts of the tree and refer to the older segment(s) for the
remaining tree. A quick estimate of the size overhead of a minimal
commit that updates just a single property is in the order of hundreds
of bytes, depending a bit on the content structure.

> The data format in your proposal seems to be binary and not Json. For me,
> using Json would have the advantage that we can use MongoDb features
> (queries, indexes, atomic operations, debugging,..). With your design,
> only 1% of the MongoDb features could be used (store a record, read a
> record), so that basically we would need to implement the remaining
> features ourselves. On the other hand, it would be extremely simple to
> port to another storage engine. As far as I understand, all the data might
> as well be stored in the data store / blob store with very little changes.

Right. In addition to storage-independence, the main reasons for going
with a custom binary format instead of JSON was to avoid having to
parse an entire segment just to access an individual node or value.

Note that the proposed design actually does rely on lots of MongoDB
features beyond basic CRUD. Things like sharding, distributed access,
atomic updates, etc. are essential for the design to scale up well.

> As far as I understand, a commit where only one single value is changed
> would result in one journal entry and one segment. I was thinking, would
> it be possible to split a segment / journal into smaller blocks in such
> case, but I'm not sure how complex that would be. And the reverse: merge
> small segments from time to time.

Indeed, see my response to Marcel's post.


Jukka Zitting

View raw message