jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Meschberger <fmesc...@adobe.com>
Subject Re: Design idea for a production-scale in-memory microkernel
Date Thu, 09 Aug 2012 08:29:29 GMT
Hi,

Interesting thoughts.

To double on the in-memory assumption of the complete tree, I'd like to add that the most
dramatic improvement in overall Jackrabbit performance on a configuration can probably be
reached by increasing the bundle cache size which eventually is more or less what you are
proposing.

Regards
Felix

Am 07.08.2012 um 16:07 schrieb Jukka Zitting:

> Hi,
> 
> [Just throwing an idea around, no active plans for further work on this.]
> 
> One of the biggest performance bottlenecks with current repository
> implementations is disk speed, especially seek times but also raw data
> transfer rate in many cases. To work around those limitations we've in
> Jackrabbit used various caching strategies that considerably
> complicate the codebase and still have trouble with cache misses and
> write-through performance.
> 
> As an alternative to such designs, I was thinking of a microkernel
> implementation that would keep the *entire* tree structure in memory,
> i.e. only use the disk or another backend for binaries and possibly
> for periodic backup dumps. Fault tolerance against hardware failures
> or other restarts would be achieved by requiring a clustered
> deployment where all content is kept as copies on at least three
> separate physical servers. Redis (http://redis.io/) is a good example
> of the potential performance gains of such a design.
> 
> To estimate how much memory such a model would need, I looked at the
> average bundle size of a vanilla CQ5 installation. There the average
> bundle (i.e. a node with all its properties and child node references)
> size is just 251 bytes. Even assuming larger bundles and some level of
> storage and index overhead it seems safe to assume up to about 1kB of
> memory per node on average. That would allow one to store some 1M
> nodes in each 1GB of memory.
> 
> Assuming that all content is evenly spread across the cluster in a way
> that puts copies of each individual bundle on at least three different
> cluster nodes and that each cluster node additionally keeps a large
> cache of most frequently accessed content, a large repository with
> 100+M content nodes could easily run on a twelve-node cluster where
> each cluster node has 32GB RAM, a reasonable size for a modern server
> (also available from EC2 as m2.2xlarge). A mid-size repository with
> 10+M content nodes could run on a three- or four-node cluster with
> just 16GB RAM per cluster node (or m2.xlarge in EC2).
> 
> I believe such a microkernel could set a pretty high bar on
> performance! The only major performance limit I foresee is the network
> overhead when writing (need to send updates to other cluster nodes)
> and during cache misses (need to retrieve data from other nodes), but
> the cache misses would only start affecting repositories that go
> beyond what fits in memory on a single server (i.e. the mid-size
> repository described above wouldn't yet be hit by that limit) and the
> write overhead could be amortized by allowing the nodes to temporarily
> diverge until they have a chance to sync up again in the background
> (as allowed by the MK contract).
> 
> BR,
> 
> Jukka Zitting


Mime
View raw message