hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Greenwood" <to...@audiencescience.com>
Subject RE: Zookeeper WAN Configuration
Date Wed, 29 Jul 2009 18:28:04 GMT
Flavio -

Inline in the top snippet.

> We should have a twiki page on this. For now, you can find an example
in 
> the header of QuorumHierarchical.java.

[Todd] Got it, QuorumHierarchical.java comments are very clear.

> 
> Also, I found a couple of bugs recently that may or may not affect
your 
> setup, so I suggest that you apply the patches in ZOOKEEPER-481 and 
> ZOOKEEPER-479. We would like to have these patches in for the next 
> release (3.2.1), which should be out in two or three weeks, if there
is 
> no further complication.
> 

[Todd] What is the recommended policy regarding patching zookeeper
locally? As an external user, should I patch and compile in the trunk or
in the branch (branch-3.2)? 

I've looked at :
http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute
http://wiki.apache.org/hadoop/HowToRelease

And both of these seem well thought out but aimed at commiters commiting
to the trunk. 

> Another issue that I realized that won't work in your case, but the
fix 
> would be relatively easy, is the guarantee that no zero-weight
follower 
> will be elected. Currently, we don't check the weight during leader 
> election. I'll open a jira and put up a patch soon.

[Todd] What source file(s) this would be in? I'll take a look at it. 

-Todd

-----Original Message-----
From: Patrick Hunt [mailto:phunt@apache.org] 
Sent: Tuesday, July 28, 2009 9:50 AM
To: zookeeper-user@hadoop.apache.org
Subject: Re: Zookeeper WAN Configuration

Flavio, please enter a doc jira for this if there are no docs, it should

be in forrest, not twiki btw. It would be good if you could review the 
current quorum docs (any type) and create a jira/patch that addresses 
any/all shortfall.

Patrick

Flavio Junqueira wrote:
> Todd, Some more answers. Please check out carefully the information at

> the bottom of this message.
> 
> On Jul 27, 2009, at 4:02 PM, Todd Greenwood wrote:
> 
>>
>> I'm assuming that you're setting the weight of ZooKeeper servers in
>> PODs to zero, which means that their votes when ordering updates do
>> not count.
>>
>> [Todd] Correct.
>>
>> If my assumption is correct, then you should see a significant
>> improvement in read performance. I would say that write performance
>> wouldn't be very different from clients in PODs opening a direct
>> connection to DC.
>>
>> [Todd] So the Leader, knowing that machine(s) have a voting weight of

>> zero, doesn't have to wait for their responses in order to form a 
>> quorum vote? Does the leader even send voting requests to the weight 
>> zero followers?
>>
> 
> In the current implementation, it does. When we have observers 
> implemented, the leader won't do it.
> 
>>
>>
>>> 3. ZK Servers within the POD would be resilient to network
>>> connectivity failure between the POD and the DC. Once connectivity
>>> re-established, the ZK Servers in the POD would sync with the ZK
>>> servers in the DC, and, from the perspective of a client within the
>>> POD, everything just worked, and there was no network failure.
>>>
>>
>> We want to have servers switching to read-only mode upon network
>> partitions, but this is a feature under development. We don't have
>> plans for implementing any model of eventual consistency that would
>> allow updates even when not being able to form a quorum, and I
>> personally believe that it would be a major change, with major
>> implications not only to the code base, but also to the semantics of
>> our API.
>>
>> [Todd] What is the current (3.2) behaviour in the case of a network 
>> failure that prevents connectivity between ZK Servers in a pod? 
>> Assuming the pod is composed of weight=0 followers...are the clients 
>> connected to these zookeeper servers still able to read? do they get 
>> exceptions on write? do the clients hang if it's a synchronous call?
> 
> The clients won't be able to read because we don't have this feature
of 
> going read-only upon partitions.
> 
>>
>>
>>> 4. A WAN topology of co-located ZK servers in both the DC and (n)
>>> PODs would not significantly degrade the performance of the
>>> ensemble, provided large blobs of traffic were not being sent across
>>> the network.
>>
>> If the zk servers in the PODs are assigned weight zero, then I don't
>> see a reason for having lower performance in the scenario you
>> describe. If weights are greater than zero for zk servers in PODs,
>> then your performance might be affected, but there are ways of
>> assigning weights that do not require receiving votes from all co-
>> locations for progress.
>>
>> [Todd] Great, we'll proceed with hierarchical configuration w/ ZK 
>> Servers in pods having a voting weight of zero. Could you provide a 
>> pointer to a configuration that shows this? The docs are a bit lean
in 
>> this regard...
>>
> 
> We should have a twiki page on this. For now, you can find an example
in 
> the header of QuorumHierarchical.java.
> 
> Also, I found a couple of bugs recently that may or may not affect
your 
> setup, so I suggest that you apply the patches in ZOOKEEPER-481 and 
> ZOOKEEPER-479. We would like to have these patches in for the next 
> release (3.2.1), which should be out in two or three weeks, if there
is 
> no further complication.
> 
> Another issue that I realized that won't work in your case, but the
fix 
> would be relatively easy, is the guarantee that no zero-weight
follower 
> will be elected. Currently, we don't check the weight during leader 
> election. I'll open a jira and put up a patch soon.
> 
> -Flavio
> 
> 
> 

Mime
View raw message