hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Greenwood" <to...@audiencescience.com>
Subject RE: Zookeeper WAN Configuration
Date Sun, 26 Jul 2009 18:05:25 GMT
Flavio, thank you for the suggestion.

I have looked at the documention (relevant snippets pasted in below), and looked at the presentations
(http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations),
but I still have some questions about WAN configuration:

---------------------------------------------------------------
WAN
----
A <-> B
A <-> C
A <-> D

A is a central processing hub (DC).
B-D are remote colo edge nodes (PODS).
Each POD contains (m) ZK Servers with (q) client connections.
---------------------------------------------------------------
  
What are the advantages and disadvantages to co-locating ZK Servers across a WAN? Could you
correct my admitedly naïve assumtions here?

1. ZK Servers within a POD would significantly improve read/write performance within a given
POD, v.s. clients within the POD opening connections to the DC.

2. ZK Servers within a POD would provide local file transacted storage of writes, obviating
the need to write that code ourselves.

3. ZK Servers within the POD would be resilient to network connectivity failure between the
POD and the DC. Once connectivity re-established, the ZK Servers in the POD would sync with
the ZK servers in the DC, and, from the perspective of a client within the POD, everything
just worked, and there was no network failure.

4. A WAN topology of co-located ZK servers in both the DC and (n) PODs would not significantly
degrade the performance of the ensemble, provided large blobs of traffic were not being sent
across the network.

--------------------
Doc references below
--------------------

http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html

"""
group.x=nnnnn[:nnnnn]

    (No Java system property)

    Enables a hierarchical quorum construction."x" is a group identifier and the numbers following
the "=" sign correspond to server identifiers. The left-hand side of the assignment is a colon-separated
list of server identifiers. Note that groups must be disjoint and the union of all groups
must be the ZooKeeper ensemble.
weight.x=nnnnn

    (No Java system property)

    Used along with "group", it assigns a weight to a server when forming quorums. Such a
value corresponds to the weight of a server when voting. There are a few parts of ZooKeeper
that require voting such as leader election and the atomic broadcast protocol. By default
the weight of server is 1. If the configuration defines groups, but not weights, then a value
of 1 will be assigned to all servers.
"""

http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperInternals.html

"""
A different construction that uses weights and is useful in wide-area deployments (co-locations)
is a hierarchical one. With this construction, we split the servers into disjoint groups and
assign weights to processes. To form a quorum, we have to get a hold of enough servers from
a majority of groups G, such that for each group g in G, the sum of votes from g is larger
than half of the sum of weights in g. Interestingly, this construction enables smaller quorums.
If we have, for example, 9 servers, we split them into 3 groups, and assign a weight of 1
to each server, then we are able to form quorums of size 4. Note that two subsets of processes
composed each of a majority of servers from each of a majority of groups necessarily have
a non-empty intersection. It is reasonable to expect that a majority of co-locations will
have a majority of servers available with high probability.

With ZooKeeper, we provide a user with the ability of configuring servers to use majority
quorums, weights, or a hierarchy of groups.
"""

-----Original Message-----
From: Flavio Junqueira [mailto:fpj@yahoo-inc.com] 
Sent: Saturday, July 25, 2009 7:55 AM
To: zookeeper-user@hadoop.apache.org
Subject: Re: Zookeeper WAN Configuration

Todd, you can try using flexible quorums to implementing what your  
requesting. You can simulate the behavior I described of observers by  
setting the weight of the server to zero. Please check the  
documentation at:

	http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html

Check under "Cluster Options" options like group and weight.

-Flavio


On Jul 24, 2009, at 5:03 PM, Todd Greenwood wrote:

>
> In the future, once the Observers feature is implemented, then we  
> should
> be able to deploy zk servers to both the DC and to the pods...with all
> the goodness that Flavio mentions below.
>
>
> -----Original Message-----
> From: Flavio Junqueira [mailto:fpj@yahoo-inc.com]
> Sent: Friday, July 24, 2009 4:50 PM
> To: zookeeper-user@hadoop.apache.org
> Subject: Re: Zookeeper WAN Configuration
>
> Just a few quick observations:
>
> On Jul 24, 2009, at 4:40 PM, Ted Dunning wrote:
>
>> On Fri, Jul 24, 2009 at 4:23 PM, Todd Greenwood
>> <toddg@audiencescience.com>wrote:
>>
>>> Could you explain the idea behind the Observers feature, what this
>>> concept is supposed to address, and how it applies to the WAN
>>> configuration problem in particular?
>>>
>>
>> Not really.  I am just echoing comments on observers from them that
>> know.
>>
>
> Without observers, increasing the number of servers in an ensemble
> enables higher read throughput, but causes write throughput to drop
> because the number of votes to order each write operation increases.
> Essentially, observers are zookeeper servers that don't vote when
> ordering updates to the zookeeper state. Adding observers enables
> higher read throughput affecting minimally write throughput (leader
> still has to send commits to everyone, at least in the version we have
> been working on).
>
>>
>>> """
>>> The ideas for federating ZK or allowing observers would likely do
>>> what
>>> you
>>> want.  I can imagine that an observer would only care that it can  
>>> see
>>> it's
>>> local peers and one of the observers would be elected to get updates
>>> (and
>>> thus would care about the central service).
>>> """
>>> This certainly sounds like exactly what I want...Was this
>>> introduced in
>>> 3.2 in full, or only partially?
>>>
>>
>> I don't think it is even in trunk yet.  Look on Jira or at the
>> recent logs
>> of this mailing list.
>
> It is not on trunk yet.
>
> -Flavio
>


Mime
View raw message