hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Waite <waite....@googlemail.com>
Subject Re: Managing multi-site clusters with Zookeeper
Date Mon, 08 Mar 2010 19:18:46 GMT
Hi Patrick,

Thanks for you input.

I am planning on having 3 zk servers per data centre, with perhaps only 2 in
the tie-breaker site.

The traffic between zk and the applications will be lots of local reads -
"who is the primary database ?".  Changes to the config will be rare (server
rebuilds, etc - ie. planned changes) or caused by server / network / site
failure.

The interesting thing in my mind is how zookeeper will cope with inter-site
link failure - how quickly the remote sites will notice, and how quickly
normality can be resumed when the link reappears.

I need to get this running in the lab and start pulling out wires.

regards,
Martin

On 8 March 2010 17:39, Patrick Hunt <phunt@apache.org> wrote:

> IMO latency is the primary issue you will face, but also keep in mind
> reliability w/in a colo.
>
> Say you have 3 colos (obv can't be 2), if you only have 3 servers, one in
> each colo, you will be reliable but clients w/in each colo will have to
> connect to a remote colo if the local fails. You will want to prioritize the
> local colo given that reads can be serviced entirely local that way. If you
> have 7 servers (2-2-3) that would be better - if a local server fails you
> have a redundant, if both fail then you go remote.
>
> You want to keep your writes as few as possible and as small as possible?
> Why? Say you have 100ms latency btw colos, let's go through a scenario for a
> client in a colo where the local servers are not the leader (zk cluster
> leader).
>
> read:
> 1) client reads a znode from local server
> 2) local server (usually < 1ms if "in colo" comm) responds in 1ms
>
> write:
> 1) client writes a znode to local server A
> 2) A proposes change to the ZK Leader (L) in remote colo
> 3) L gets the proposal in 100ms
> 4) L proposes the change to all followers
> 5) all followers (not exactly, but hopefully) get the proposal in 100ms
> 6) followers ack the change
> 7) L gets the acks in 100ms
> 8) L commits the change (message to all followers)
> 9) A gets the commit in 100ms
> 10) A responds to client (< 1ms)
>
> write latency: 100 + 100 + 100 + 100 = 400ms
>
> Obviously keeping these writes small is also critical.
>
> Patrick
>
>
> Martin Waite wrote:
>
>> Hi Ted,
>>
>> If the links do not work for us for zk, then they are unlikely to work
>> with
>> any other solution - such as trying to stretch Pacemaker or Red Hat
>> Cluster
>> with their multicast protocols across the links.
>>
>> If the links are not good enough, we might have to spend some more money
>> to
>> fix this.
>>
>> regards,
>> Martin
>>
>> On 8 March 2010 02:14, Ted Dunning <ted.dunning@gmail.com> wrote:
>>
>>  If you can stand the latency for updates then zk should work well for
>>> you.
>>> It is unlikely that you will be able to better than zk does and still
>>> maintain correctness.
>>>
>>> Do note that you can, probalbly bias client to use a local server. That
>>> should make things more efficient.
>>>
>>> Sent from my iPhone
>>>
>>>
>>> On Mar 7, 2010, at 3:00 PM, Mahadev Konar <mahadev@yahoo-inc.com> wrote:
>>>
>>>  The inter-site links are a nuisance.  We have two data-centres with
>>> 100Mb
>>>
>>>> links which I hope would be good enough for most uses, but we need a 3rd
>>>>> site - and currently that only has 2Mb links to the other sites.  This
>>>>> might
>>>>> be a problem.
>>>>>
>>>>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message