jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: [jr3] Clustering: Scalable Writes / Asynchronous Change Merging
Date Tue, 19 Oct 2010 22:05:18 GMT
Hi,

On Tue, Oct 19, 2010 at 12:24 PM, Thomas Müller <thomas.mueller@day.com> wrote:
> The current Jackrabbit clustering doesn't scale well for writes
> because all cluster nodes use the same persistent storage. Even if
> persistence storage is clustered, the cluster journal relies on
> changes being immediately visible in all nodes. That means Jackrabbit
> clustering can scale well for reads, however it can't scale well for
> writes. This is a property Jackrabbit clustering shares with most
> clustering solutions for relational databases. Still, it would make
> sense to solve this problem for Jackrabbit 3.

Agreed. The advent of the read/write web has notably increased the
importance of scalable write functionality in web backends. We aren't
seeing the full impact of this yet, but I know that many of our users
are rolling out new sites and other applications with all sorts of
commenting, tracking and social features, and that such deployments
will sooner or later start hitting our current write bottleneck.

> == Jackrabbit 3 Clustering ==
>
> [Cluster Node 1]  <-->  [ Local Storage ]
> [Cluster Node 2]  <-->  [ Local Storage ]

I'd even like to float the idea of the local storage of each cluster
node being RAM instead of a database or the file system. Instead of
persisting changes to a disk, durability could be achieved by syncing
the changes to at least one or two other cluster nodes. But that's
probably best discussed in another thread...

> == Unique Change Set Ids ==
> [...]
> changeSetId = nanosecondsSince1970 * totalClusterNodes + clusterNodeId

We could also use normal UUIDs or SHA1 hashes of the serialized change
sets as these identifiers as long as we include timestamp information
(and perhaps the identity of the originating cluster node) with the
changes. That way you wouldn't have to make assumptions about the
cluster configuration in advance.

> == How to Merge Changes ==
> [...]
> Changes with change set ids in the future are delayed. Cluster nodes
> should have reasonably synchronized clocks (it doesn't need to be
> completely exact, but it should be reasonably accurate, so that such
> delayed events are not that common).

Instead of relying on clock synchronization (many virtual servers
suffer from serious clock drift), we could leverage a virtual time
algorithm like the one described in [1].

[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.2620

> == Solution A: Node Granularity, Ignore Old Changes ==

As you mentioned, this is problematic.

> == Solution B: Merge Old Changes ==

This sounds promising, but needs to be reviewed for all the potential
conflicts. We'll probably need some mechanism for making the content
of conflicting changes available for clients to review event if the
merge algorithm chooses to discard them.

BR,

Jukka Zitting

Mime
View raw message