zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shraer <shra...@yahoo-inc.com>
Subject RE: Mounting a remote Zookeeper
Date Fri, 10 Jun 2011 00:10:57 GMT
This is a preliminary proposal, so everything is still open. Still, I think there are many
advantages over the previous namespace partitioning proposal (http://wiki.apache.org/hadoop/ZooKeeper/PartitionedZookeeper)
that wasn't implemented AFAIK. The idea here is to make much smaller and more intuitive changes.

For example, the previous proposal did not offer any ordering guarantees across partitions.
Also - in Linux mount you don't need to specify for each new file which mount point the file
belongs to - we can exploit the tree structure to infer that instead of creating and maintaining
an additional hierarchy like in the previous proposal. 

> what happens when a client does a read on the remote ZK cluster. does the read always
get 
> forwarded to the remote cluster?

No. The idea is to identify when inter-cluster communication is necessary to maintain sequential
consistency and otherwise avoid it. In the twiki we propose such a possible rule. For example,
if you read from a remote partition that didn't mount any part of your local namespace, it's
ok to return an old value. In any case, the read is never forwarded to the remote cluster
- even if inter-cluster communication is necessary, we sync the observer with the remote leader
and then read from the observer.

> in your proposal, what happens if an a client creates an ephemeral
> node on the remote ZK cluster. who does the failure detection and clean up?

You're right, we should definitely address that in the twiki. I think that in any case a cluster
should only monitor the clients connected to that cluster and not clients connected to remote
clusters. So if we support creating remote ephemeral nodes I think failure detection should
be done locally and the remote cluster should subscribe to relevant local failure events and
be notified. 

> what happens if the request to the remote cluster hangs?

A user can determine what happens in this case. If he wants all his following requests to
fail, a remote request will block all his following requests. Otherwise a remote request can
fail and still his following local requests can succeed.

Thanks,
Alex

> -----Original Message-----
> From: Benjamin Reed [mailto:breed@apache.org]
> Sent: Thursday, June 09, 2011 4:05 PM
> To: user@zookeeper.apache.org
> Subject: Re: Mounting a remote Zookeeper
> 
> this is a small nit, but i think the partition proposal works a bit
> more like a mount point than your proposal. when you mount a file
> system, the mount isn't transparent. two mounted file systems can have
> files with the same inode number, for example. you also can't do some
> things like a rename across file system boundaries.
> 
> in your proposal, what happens if an a client creates an ephemeral
> node on the remote ZK cluster. who does the failure detection and
> clean up? it also wasn't clear what happens when a client does a read
> on the remote ZK cluster. does the read always get forwarded to the
> remote cluster? also what happens if the request to the remote cluster
> hangs?
> 
> thanx
> ben
> 
> On Thu, Jun 9, 2011 at 11:41 AM, Alexander Shraer <shralex@yahoo-
> inc.com> wrote:
> > Hi,
> >
> > We're considering working on a new feature that will allow "mounting"
> part of the namespace of one ZK cluster into another ZK cluster. The
> goal is essentially to be able to partition a ZK namespace while
> preserving current ZK semantics as much as possible.
> > More details are here:
> http://wiki.apache.org/hadoop/ZooKeeper/MountRemoteZookeeper
> >
> > It would be great to get your feedback and especially please let us
> know if you think your application can benefit from this feature.
> >
> > Thanks,
> > Alex Shraer and Eddie Bortnikov
> >
> >
> >

Mime
View raw message