jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Müller <thomas.muel...@day.com>
Subject Re: [jr3] Clustering: Scalable Writes / Asynchronous Change Merging
Date Wed, 20 Oct 2010 09:22:03 GMT
Hi,

Let's discuss partitioning / sharding in another thread. Asynchronous
change merging is not about how to manage huge repositories (for that
you need partitioning / sharding), it's about how to manage cluster
nodes that are relatively far apart. I'm not sure if this is the
default use case for Jackrabbit. Traditionally, asynchronous change
merging (synchronizing) is only used if the subsystems are offline for
some time, or if there is a noticeable networking delay between them,
for example if cluster nodes are in different countries.

But I don't want that "the network is the new disk" (in terms of
bottleneck, in terms of performance problem). Networking delay may be
the bottleneck even if cluster nodes are in the same room, specially
when you keep the whole repository in memory, or use SSDs. Also,
computers get more and more cores, and at some point message passing
is more efficient than locking.

Asynchronous operation is bad for reservation systems, banking
applications, or if you can't guarantee sticky sessions. Here you need
synchronous operations or at least locking. If you want to support
both cases in the same repository, you could use virtual repositories
(which are also good for partitioning / sharding).

My proposal is for Jackrabbit 3 only. In the extreme case, the
"asynchronous change merger" might very well be a separate thread and
use little more than the JCR API. Therefore asynchronous change
merging should have very little or no effect on performance if it is
not used. On the other hand, replication should likely be in the
persistence layer. I think the persistence API should be synchronous
as it is now.

> We could also use normal UUIDs or SHA1 hashes of the serialized change sets

That's an option, but lookup by node id and time must be efficient.
UUIDs / secure hashes are not that space efficient (that might not be
the problem). We see from Jackrabbit that indexing random data (UUIDs)
is extremely bad for cache locality and index efficiency, but if
indexing is done by time then that's also not a problem. The algorithm
I propose is sensitive to configuration changes, but you only need to
change the formula when going from "max 256 cluster nodes" to "more
than 256 cluster nodes" (for example). And you need a unique cluster
id. But I don't think that's the problem.

> we could leverage a virtual time algorithm

I read the paper, but I don't actually understand how to implement it.

> We'll probably need some mechanism for making the content
> of conflicting changes available for clients to review event if the
> merge algorithm chooses to discard them.

If we leave it up to the client to decide what to do, then things
might more easily run out of sync. But in any case there might be
problems, for example synchronous event listeners might get a
different order of events in different cluster nodes (possibly even
different events). Probably it would make sense to add some kind of
offline comparison / sync feature, similar to rsync. Actually that
could be useful even for Jackrabbit 2.

Regards,
Thomas

Mime
View raw message